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ON A CLASSIFICATION OF THE PROBLEMS 
OF STATISTICAL INFERENCE 


By W. Epwarps Dzmine 
Bureau of the Census 


rise to scientific decisions for action, and to show how the results of 
experiments and surveys (in other words, the numerical data of nature) 
are used as part, but only part, of the evidence required for these de- 
cisions. An analysis of the factors that influence scientific decisions and 
recommendations for action, and an attempt to state the responsi- 
bilities of the statistician, bring about a useful classification of the 
problems of statistical inference. This classification has been helpful in 
the design of enumeration and tabulation procedures, both for samples 
and complete counts, and it applies equally well in industry and in the 
natural and social sciences. The usefulness of mathematical statistics, 
or of any other method of inference, will be measured by its ability to 
assist judgment in making better predictions, and making them oftener. 
The classification made here provides a guide for evaluating the useful- 
ness of mathematical statistics in specific problems, on the grounds just 
stated. 


re PAPER springs from an attempt to set forth the factors that give 


DATA ARE FOR PREDICTION AND ACTION 


The ultimate purpose of taking data is action. Scientific data are not 
taken for museum purposes; they are taken as a basis for doing some- 
thing. If nothing is to be done with the data, then there is no use collect- 
ing any. The ultimate purpose of taking data is to provide a basis for 
action or a recommendation for action. The step intermediate between 
the collection of data and the action is prediction. 


Every empirical statement of science is a prediction. It is a philosophic 
commonplace that every empirical statement in science has temporal 
spread, and partakes the nature of a prediction. There is no scientific 
interest in any measurement or empirical relationship that does not 
help to explain what will happen or has happened at another time or 
another place.! 


1C. I. Lewis, Mind and the World-Order (Scribners, 1929), pp. 129, 132, 195. A mathematical result 
is not put to a test by what actually happens in nature; it is not empirical. 
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An operation of measurement (experiment or survey, carried out by 
sampling or complete count), if it is repeated over and over, creates a 
sequence of results, commonly known as a population of measurements. 
Any one measurement is but one term in a sequence of terms, and this 
sequence of terms actually or theoretically might be extended by re- 
peated applications of the operation. Not this one term, but its relation 
to the rest of the sequence is the point of interest. When you say that the 
length of this table is 6 feet you make a prediction; you imply that 
anyone repeating this or any accepted method of measurement will 
also find the length of this table to be 6 feet within limits that you must 
specify, depending on the requirements. 

The publication of a measurement is in two ways a prediction with 
regard to measurements not yet taken; first, it is a prediction with re- 
gard to repeated applications of this one method (more terms of this 
series) ; and second, by implication at least, it is a prediction with regard 
to repeated applications of other methods (terms of other series). 

The announcement of a functional relationship is likewise a predic- 
tion. A curve fitted to a set of points is of interest, not on account of the 
data fitted, but because of data not yet fitted. How will this curve fit 
the nezt batch of data? 


ACTION, EVIDENCE, AND THE STATISTICIAN’S JOB 


What constitutes evidence? What constitutes a statistical method? When 
a problem arises, demanding action, action will be taken. The scientific 
attitude is to base the action on rational predictions and the degree of 
belief associated with these predictions, as well as on the possible con- 
sequences of different courses of action. The degree of belief in any 
prediction will depend on the evidence available: a change in evidence 
will change the degree of belief, and hence ultimately, possibly the 
action also. The amount of money that should be spent collecting evi- 
dence depends on the hazards of the action. In some circumstances, 
indications afforded by a scant amount of data will suffice as evidence 
for the action required. In more hazardous circumstances, a vast 
amount of data may be needed as a basis for action. Of course, there is 
not always time to collect the evidence that is really needed, as when 
action must and will be taken at a certain time, whether or no. 

Information that does not affect the degree of belief is irrelevant to 
the purpose and is not evidence. Likewise, any method of inference 
that does not help to predict, or which does not affect the degree of 
belief in some prediction that needs to be considered, is irrelevant to 
the purpose. A method that is useful in one set of requirements may 
not be in another. 
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An experiment or survey (such as a census) should be designed to 
provide evidence on which to evaluate the degree of belief in the pre- 
dictions that are assumed to be useful in formulating a course of action 
for a problem that is faced. A single experiment or survey, however, 
will rarely if ever constitute all of the available evidence. No evidence 
from other experience dare be ignored if it affects the degree of belief 
in a prediction that needs to be considered in deciding an important 
course of action. 


Action tests the prediction. If there is to be no action, and hence no 
test, then any prediction, regardless of evidence, may safely be made. 
On the other hand, if the consequences of taking the wrong action will 
be costly in time, money, materials, comfort, or prestige, then the con- 
sequences of the action will be carefully weighed along with the degree 
of belief and the evidence for the prediction. The dependence of action 
on prediction, the degree of belief associated with the prediction, and 
the consequences of the action, can be put in the accompanying diagram. 


POSSIBLE 
CONSEQUENCES 
OF THE ACTION 








ACTION 
DEGREE 
or PREDICTION 
BELIEF 
EVIDENCE 


Diagram showing that the action adopted depends on the possible consequences of the action, and 
on a prediction and the degree of belief associated with the prediction. The degree of belief in a predic- 
tion depends on evidence. The lower triangle is the Shewhart diagram of the three components of knowl- 
edge—evidence, prediction, and degree of belief. 
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The statistician’s job. It is the statistician’s job to make rational pre- 
dictions concerning measurements yet to be made. He also helps other 
people make predictions by providing evidence (e.g., collecting data), 
and evaluating this evidence. Anyone who makes a statement that is 
based partly or wholly on data of nature is in some respect a statisti- 
cian, but the statistician is expected to make better predictions oftener 
than other people and to know what confidence he may have in his 
predictions; that is his business. Since action will be based on his pre- 
dictions, and thus put them to a test, he is vitally interested in the con- 
sequences of his predictions, hence in the evidence and degree of belief 
in them. He cannot afford to be wrong in the wrong place. A prediction 
that cannot be put to a test, at least theoretically, or which is not in- 
tended to be put to a test, is not a scientific statement, and does not 
require a statistician; anyone can make such predictions. The statis- 
tician’s job therefore appears to be fourfold: 


i. To plan the collection of data to provide evidence for what- 
ever predictions must be considered in the decision for action. 

ii. To describe the method by which the data were collected, and 
to present the data by summaries and comparisons. This 
presentation must be carried out so that the degree of belief, 
and hence the action taken, will be the same on the basis of 
the summary as it would be on the basis of the original data.? 

iii. To make predictions as a basis for action. This will require 
knowledge of the subject matter, and perhaps mathematical 
statistics (vide infra). 

iv. To make recommendations for action. In so doing, the statis- 
tician will take account of all available evidence and the degree 
of belief in the predictions that need to be considered; he will 
take account also of the consequences of each possible course 
of action. 

TWO TYPES OF STATISTICAL PROBLEMS 


Description of the two types. In my own work I have found it useful 
to distinguish between two types of problems that confront the statis- 
tician in his job of making predictions: 


Type A. Problems in which action is based on a prediction regard- 
ing future measurements of a product already in existence. The 
evidence comes from measurements already made on the thing 
itself, or on samples thereof. 


2 This is merely a restatement of Shewhart’s rule 2, given in his Statistical Method from the Viewpoint 
of Quality Control (The Graduate School, Department of Agriculture, Washington, 1939), p. 92. 
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Type B. Problems in which action is based on a prediction regard- 
ing future measurements of a product not yet subject to measure- 
ment, perhaps not even produced yet. The evidence comes from 
measurement made on other product originating from the same 
or similar processes. 


In both types of problems the ultimate purpose is action. In problems 
of Type A, some action is to be based on the measurements of a prod- 
uct, or samples thereof. The Type A problem might be described as the 
problem of measuring something. Interest centers in the product as it is, 
not on how it got that way, or what it ought to be or might have been. 
The action taken may be, and often is, action on the thing that is 
measured (see the example described below). The evidence for the pre- 
diction is furnished by (i) measurements that have been made on this 
particular product; (ii) previous experience with the method of meas- 
urement that is used; and (iii) previous experience with the method of 
sampling, if sampling is used. 

In contrast, in problems of Type B, the action is based on predictions 
regarding future measurements of some product not yet subject to 
measurement, perhaps not even produced yet. Interest centers in the 
process, the underlying cause system of forces (social, economic, me- 
chanical, } hysiological, chemical, geological, biological), that give rise 
to yesterday’s, today’s, and tomorrow’s product. It is through a study 
of the underlying process and what it has produced in the past that 
one is able to predict the product of the future—more generally, the 
results of measuring a product not yet subject to measurement. Studies 
of Type B are carried out with the aid of measurements made on past 
product, originating . vm basically the same or related processes. In 
some lines of work (notably industry and agriculture) these studies 
may lead to a modification of the process itself. 


An example. The distinction between Type A and Type B can be 
illustrated by consideration of an industrial product. Each lot is in- 
spected in order to determine what disposition shall be made of it. 
Disposition of the product will be made in accordance with certain 
rules of action, action on each lot being taken, we shall presume, on the 
basis of the quality of that lot and that lot alone, regardless of what its 
predecessors have been. The disposition may be to pass it, accept, re- 
ject, regrade, sort, or repair it. This is a Type A problem, because the 
action depends on what the lot is, not on what it might have been, or 
what is expected of future lots. The easiest and cheapest way to inspect 
the lot may be to measure all of it: if this is too costly or otherwise im- 
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practicable, sampling will be used, possibly with elaborate sampling 
design, calculation, and experimentation. Schemes of double sampling 
may be helpful.’ 

The lots are inspected for another purpose also. It is desirable to 
keep an eye on the production process, to forestall an inordinate per- 
centage of rejections in the future, and to point the way to effecting im- 
provements or desired changes in future quality. This is a Type B 
problem, and would be carried out by every possible means to attain 
the end. Investigations will be made into the underlying mechanical and 
chemical forces that make the product what it is. The quality of past 
product, made available ty Type A investigations, will be brought into 
the study. It may be profitable to watch the hourly variations of 
quality, as observed in samples of current product (cf. the section 
“Some remarks on quality control in industry”). Such studies are di- 
rected primarily for action on the process, not for the disposition of any 
particular lot of product. 

Similar analogies can be drawn from social and economic studies. 
Type B problems constitute a large part of science, natural and social. 
The establishment of any causal or functional relationship, for instance, 
is a Type B problem. The ultimate goal in establishing a relationship 
is to arrive at a theory or formula that will hold, with stated limitations, 
for data not yet taken. Varietal and treatment tests in agriculture, for 
instance, are not made just to determine which was best under certain 
conditions (Type A), but rather to help decide which will be best, and 
under what conditions (Type B).*‘ 


NOTES REGARDING THE USE OF MATHEMATICAL STATISTICS 


Prediction, mathematical statistics, and the prerequisite of statistical 
stability. As the operation of measuring a product is repeated over and 
over, whether on samples or the entire lot, a population consisting of a 
sequence of terms is generated. Each term represents a measurement, 
or a function of several measurements. The Type A problem is to find 
a method by which predictions concerning future terms of this sequence 
can be made with the highest possible degree of belief. This is the prob- 
lem that the statistician is called upon to answer wherever the results 
of measurement and sample surveys form the basis of planning or other 
action. He is expected to answer it if possible, and to know when he 
can answer it. When he can, he is expected to give a better answer 
oftener than other people can give. 


* See for example, H. F. Dodge and H. G. Romig, “Single sampling and double sampling inspec- 
tion tables,” Bell System Technical Journal, Vol. XX, January 1941, pp. 1-61. 

4 Page 31 of the 2d edition of Ezekiel’s Methods of Correlation Analysis (John Wiley, 1941) is recom- 
mended at this point for an extension of these remarks. 
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In the state of randomness or statistical stability, the sequence will 
appear to have been drawn blindfolded and with replacement from a 
bowl of physically similar numbered chips. In this state, a distribution 
formed from terms of the sequence is stable, and it is then that predic- 
tions regarding future terms can be made by mathematical statistics 
with the highest attainable degree of belief. The Shewhart criterion® 
is valuable in assisting judgment regarding randomness. 

When the criterion for randomness is not met sufficiently to warrant 
the application of mathematical statistics for predictions, the inter- 
pretation of the results of the survey or experiment must lean heavily 
upon the statistician’s knowledge of the subject matter, and his ability 
to cooperate with experts. 


The fundamental problem of mathematical statistics. The fundamental 
problem of mathematical statistics is to set fixed limits within which 
percentages of the next (e.g.) 1,000 terms of a random sequence will 
fall, and to set these limits efficiently from terms generated in the past. 
The problem is evidently one of prediction. The Shewhart methods 
were devised to aid judgment in deciding whether the sequence is suf- 
ficiently near random to permit the fundamental problem to be at- 
tempted, and as an aid in solving it when it can be solved. Wilks* has 
presented a pioneer piece of theoretical work on the subject. Confidence 
intervals and fiducial probability can not fully answer the purpose 
because the interval in the fundamental problem is a tolerance interval 
of specified width; it is not a random variable, but is fixed.’ True, the 
random intervals in the theories of confidence intervals and fiducial 
probability become steady in large samples, but a large sample does 
not by itself exhibit evidence for or against stability (randomness) 
until it is broken down into rational subseries, as it will be by the ap- 
plication of the Shewhart methods. (Cf. the section, “Repeated pat- 
terns in subseries.”) 

As a simple illustration, let p be the ratio of white to black chips 
found in a sample that is drawn from a bow] containing white and black 
chips. When the sampling is random, with replacement, the average p 
will approach a statistical limit po as the number of samples increases. 
The fundamental problem of mathematical statistics is to name the 
proportion of the next 1,000 samples for which p will lie within the 


5 Walter A. Shewhart, The Economic Control of Quality of Manufactured Product (Van Nostrand, 


1931), Ch. XX. 
* S. S. Wilks, “Determination of sample sizes for setting tolerance limits,” Annals of Mathematical 


Statistics, March 1941, pp. 91-96. 

7? The distinction between the different kinds of prediction is illustrated on p. 59 and elsewhere in 
Shewhart’s Statistical Method from the Viewpoint of Quality Control (The Graduate Schooi, Department 
of Agriculture, 1939); also in fig. 11 of Deming and Birge’s Statistical Theory of Errors (The Graduate 


School, Department of Agriculture, 1934, 1938). 
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interval po+a. Whether po is the same as the ratio of white to black 
chips in the bowl is still another problem (next section). 


Knowledge of the subject matter essential in the Type A problem. Let p’ 
be the ratio of white to black chips in the bowl. It can be determined 
only by taking all the chips out of the bowl and examining each one, 
so in practical problems of sampling it remains unknown. Even when 
the sampling is carried out by a random operation, and the statistical 
limit po exists, it remains to decide whether the value of the unknown 
p’ is anything like po. To answer this question in real surveys the 
statistician must be concerned with the subject matter or even with 
psychology perhaps more than with mathematics. For instance, he 
must recognize the fact that the sampling and measuring may introduce 
biases, as for instance when the people in the sample behave differently 
(change characteristics) just because they in particular are under ob- 
servation and the others are not. It will not do to issue one statement 
(prediction) as a statistician, and some other statement as an econo- 
mist, population or social expert, agricultural expert, geologist, chemist, 
engineer, or anything else. 

Sources of discrepancies in different surveys may arise from differ- 
ences in definition and procedure: the auspices, frequencies of interro- 
gation, training of the enumerators, supervisors, and editors, inform- 
ants, pay and time allowed for the interviews, volumes of questions, 
way of asking the questions, and a host of other sources of discrepancy, 
will all affect the results. That large discrepancies can arise from small 
changes in the definitions and procedure is often not appreciated until 
a survey is repeated after a short time interval with as few changes in 
procedure as possible, or until it is compared with a similar one taken 
under different auspices at about the same time. If the surveys are 
carried out by sampling, there will of course be sampling errors to con- 
tend with, but sampling errors may be less troublesome than some of 
the other difficulties just mentioned. Increasing the samples to 100 per 
cent will eliminate the sampling errors, but not the discrepancies arising 
from differences in definition and procedure. Knowledge of the subject 
matter, and an appreciation of the limitations of measurement, are 
therefore necessary in the interpretation of results, whether they are 
obtained by complete counts or by samples. 

Similar remarks apply to the physical sciences, wherein differences 
in the various instruments, procedures, definitions, states of wear and 
tear on the instruments, and the theoretical relations that are assumed 
to exist between the different quantities, play the counterpart of the 
differences in the procedures and definitions in the social sciences. 
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Knowledge of the subject matter essential in the Type B problem. The 
study of any Type B problem will require one or more investigations of 
Type A as part of the evidence for the action that is required. The 
necessity for knowledge of the subject matter in the interpretation of 
a Type A experiment still exists when the experiment is carried out as 
part of a Type B study. 

Other evidence for a Type B prediction may come from studies of the 
underlying process from the standpoint of sociology, economics, me- 
chanics, chemistry, geology, biology, or whatever may be involved. 
Such studies of course require considerable knowledge of the subject 
matter, but they may take the place of a vast amount of experimental 
data of Type A. Moreover, in the Type B problem it is a matter of 
judgment and knowledge of the subject matter to state the range of 
validity of a relationship, and to decide when enough situations have 
been covered to establish this validity with a sufficiently high degree 
of belief for the action required. Judgment is the result of scientific 
training—intuition if you like; but intuition, like the conscience, must 
be trained. 

The importance of the design of experiments. Progress in a Type B 
study is enhanced if the Type A experiments necessary thereto are 
carried out with the greatest possible efficiency. The importance of the 
theory of complex experiments and other branches of mathematical 
statistics in the design of sample surveys and experiments can hardly 
be over-emphasized. However, it is one thing to design a sampling plan 
or experiment so that it ought to exhibit randomness and give results 
with a certain variance, but it is another matter to show that the 
results have the validity that was hoped for. 


A word on the presentation of results. An analysis and evaluation of 
data, to show how closely the operation of sampling satisfies the require- 
ments intended, ought never to be omitted in the presentation of the 
results of a Type A investigation. This analysis is part of the evidence, 
and is particularly important if mathematical statistics is used in the 
interpretation of the results. Without it, the reader does not know what 
confidence to place in the predictions that are made, and what action 
dare be recommended on the basis of them. Often when such analyses 
and evaluations have been made, it has been found that the sampling 
method did not work as intended; and what is more, reasons for spuri- 
ous results are often found, with consequent improvement in inter- 
pretation. A sampling plan ought always to be designed so that the 
sample can be broken up into small samples, geographically or tem- 
porally or both—samples that may be too small for publication, yet 


id 
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large enough to be compared with one another and examined for ran- 
domness and patterns (next section). 


Repeated patterns in subseries. Significance. A pattern that is repeated 
under a wide variety of conditions may provide evidence for a degree 
of belief sufficient to justify a hazardous course of action in a Type B 
problem, even in the absence of a rational theory regarding the under- 
lying causes. A repeated pattern may attain scientific significance, even 
though no one of the patterns by itself would seem worthy of note. 
Thus, if treatment C has invariably been found better than treatment 
D under a wide variety of soils and climates, favor toward treatment 
C would not rest so much on any “significance” calculated from either 
a single experiment or a combination of experiments, as on the apparent 
ability of treatment C to maintain superiority under any likely set of 
conditions. Other examples are afforded by numerous empirical laws. 
The necessity for breaking up a large sample and studying the patterns 
in small rational subseries (as in order of time) is in fact the kernel of 
the Shewhart methods,® which are as applicable in the Type B prob- 
lems of the social sciences as they are in industry. 

Occasionally, in an extreme case of a Type B problem, our knowledge 
of the underlying forces, however derived, may be so extensive that in 
view of the consequences of the action, it is sufficient to perform but 
one (Type A) survey or experiment, which will then constitute evidence 
for prediction with a high enough degree of belief to point to the course 
of action required. Under such circumstances, a single test of signifi- 
cance, for instance, or the calculation of a single confidence interval, 
may suffice for action. A single experiment, or a single test of signifi- 
cance, may also suffice in circumstances where the consequences of the 
action are not critical, even though knowledge of the underlying causes 
is not extensive: it may not be worth while to get more evidence. Of 
course, as was mentioned earlier, there ‘are times when action must be 
taken without sufficient evidence. 


EFFECT OF THE CLASSIFICATION ON THE 
PLANNING OF ENQUIRIES 


Necessity for keeping in mind the ultimate objectives of an enquiry; 
are they Type A or Type B? Recognition of the two types of problems 
and methods calls for discrimination in the planning of any survey or 
experiment. Its aims must be kept in mind. What will the data be used 
for? Is the problem Type A or Type B? This classification will affect 


8 This point is also insisted on by Keynes; confer his Probability (Macmillan, 1929), pp. 407-08. 
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its design, the funds needed and how they should be spent, the amount 
of detail required in tabulation, the areas of tabulation, whether 
samples will best serve the purpose, and if so, what size of sample is 
required, and—most important—how the samples should be timed and 
distributed geographically. For example, where the action by definition 
or law depends on the state of a population as it exists on a certain 
date (as for allocation of funds by states, or conscription by age groups), 
the data required are purely for Type A purposes. To provide the 
necessary detail and accuracy it will be necessary to collect the data by 
a large enough sample, and if extreme detail is required, by a complete 
count. On the other hand, in a study of relationships, which would 
necessarily be a problem of Type B, it may be wise to carry out numer- 
ous small-sized experiments or surveys, spaced temporally and geo- 
graphically in order to cover a wide variety of conditions (e.g., other 
city size groups, other climates, other soils), and thus obtain patterns 
in subseries. In the Type B problem, all experimental data, even com- 
plete counts, are but samples of what the underlying system of forces 
can and will produce.® 


Some remarks on quality control in industry. Attaining control of 
quality during production is a Type B problem. The essential feature 
of the Shewhart statistical method” is to break up the inspection data 
into small rational subgroups in order of production, so that a large 
batch of data is studied, not simply as a large sample, but as an ordered 
sequence of small samples. Quick examination of the samples is made, 
and the results are plotted on a control chart so that action can be taken 
at once when the chart shows that something has gone wrong with the 
process of manufacture. Small samples suffice, because they are not for 
lot by lot evaluation (Type A), but for control of the process (Type B). 

In the limiting state of stability, called statistical control or random- 
ness, the formulas of distribution theory may be applied to the problem 
of determining what percentage of the inspection data from tomorrow’s 
* product will fall within certain fized limits—the fundamental problem 
of mathematical statistics (q.v.). In this state the statistician may 
make full use of the calculus of probabilities, with no reference to 
further study of the process, because tomorrow’s product is simply, in 
effect, some more product drawn from the same bowl. After a program of 
statistical control is in operation, disposition of the product, lot by 


9 These remarks are extended in a paper by W. Edwards Deming and Frederick F. Stephan “On the 
interpretation of censuses as samples,” this JournNAL, Vol. 36, March 1941, pp. 45-49. 

10 Cf. the section, “Repeated patterns in subseries. Significance;” also the reference to Keynes in 
footnote 8. 
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lot, becomes to greater or less extent a Type B problem instead of 
Type A, and the amount of inspection can usually be greatly dimin- 
ished, with attendant savings in time, labor, and materials. 

The state of statistical control is not easily attained. It has never 
been known to happen by intent alone; it is rather something to be 
achieved after weeks or months of effort. Every step accomplished 
toward the attainment of statistical control, however, results in savings 
in time and materials and cost of inspection, to both the producer and 
the consumer. 


Frequent samples desirable in social and economic studies. Just as the 
examination of frequent small samples has accomplished so much in 
the Type B problems of industry, so also, the introduction of frequent 
small sample surveys of population and agriculture (perhaps 3 or 5 
per cent) throughout the country would constitute a distinct advance 
in social and economic planning. They would provide a record of chang- 
ing conditions, while these changes are taking place. Thus they would 
facilitate studies of the underlying cause systems that make the popu- 
lation what it is, and provide a better basis for planning than we have 
now. They would also enhance the value of the detail furnished by the 
complete census, and would give an indication of information that 
needs to be obtained in detail at the next complete census. 

For like reasons, quick sample reports are supplementing or displac- 
ing the slower and unwieldy complete counts in many economic surveys 
needed in government Type B planning. The presence of some sampling 
error is often unimportant compared with the advantages of the quick 
and frequent returns made possible, usually at lower cost, by sampling. 
These advantages are appreciated when the distinction between the 
Type A and Type B problems is implicitly or explicitly recognized. In 
the time series of sampling results, it is usually the relation of one 
sample to the previous samples that provides a basis for action, rather 
than any one sample by itself." 

In this connection, it is easy to wax enthusiastic over the advantages 
that would accrue from a monthly report showing causes of death by 
various areas throughout the country. By the use of sampling, a report 
could be issued a few weeks after the close of each month. Epidemics 
could be recognized in their early stages, and steps taken at the right 
time to keep them from spreading. They could be traced as they move 
about. The complete annual report would still be valuable, but for 
other purposes, such as for detail by small areas, and for extremely 


1! See footnote 10. 
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rare diseases. This would be a control program as exciting as any in 
industry.” 
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MEASURING NATIONAL INCOME 
AS AFFECTED BY THE WAR* 


By MI.LtTon GILBERT 
The Bureau of Foreign and Domestic Commerce 


HE suBJECT “Measuring National Income as Affected by the War,” 
Tis sufficiently broad to embrace many problems of income measure- 
ment. There are theoretical issues concerning the meaning of national 
income in wartime as either a production or a welfare measure that 
could be discussed. Such questions are being passed over, however, in 
favor of a problem of more immediate and practical interest—that of 
how expenditures for war purposes can be compared with national in- 
come so as to indicate what value of goods and services remain for civil- 
ian uses of various sorts. This emphasis is intended as an aid to those 
who have turned to national income as a practical tool in connection 
with responsibilities imposed by the war, and who have not the time to 
explore the subject in technical literature. Those thoroughly familiar 
with national income problems will find little that is new, apart from 
the estimates for recent years that are presented. 

As is well known, the national income has risen to record levels under 
the stimulus of, what was formerly the defense program and is now, the 
war effort. (Department of Commerce concepts and estimates are used 
throughout this paper.) In 1941 it reached 94.7 billion dollars as com- 
pared with the 1940 total of 77.3 billion dollars, the gain of over 17 
billions constituting the largest annual increase in our history. This rise 
in the national income is usually contrasted, in one form or another, 
with the change in the stimulus itself, that is, with the increase in arma- 
ment outlays. To take the figure most commonly used, the total of de- 
fense expenditures and British (or Allied) armament purchases rose 
from about 4 billion in 1940 to about 15 billion in 1941, an increase in 
the neighborhood of 11 billion dollars. 

Inasmuch as the national income is defined as the net value of the 
goods and services produced by the economic enterprises of the Nation, 
it would seem quite appropriate to make a direct comparison between 
these two aggregates. If that comparison is intended merely as a rough 
measure of the magnitude of the country’s war effort or of changes in 
the size of the effort over time, not too much violence to the facts may 
be done. A real difficulty arises, however, when an attempt is made to 
draw inferences from the national income and war expenditures aggre- 


* A paper presented at the 103rd Annual Meeting of the American Statistical Association, New 
York, December 29, 1941. 
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gates regarding changes in the level of civilian goods output. This diffi- 
culty may be highlighted by asking the following question: If 11 billion 
of the 1941 increase in national income was being utilized for national 
defense purposes, must there not have been very little increase in the 
value of goods and services utilized for civilian purposes? In fact, if the 
6 billions remaining after the defense total is subtracted from the in- 
crease in output, is contrasted with the increase in national income due 
solely to the rise in prices, would there not seem to have been negligible 
change in the real output of consumption and capital goods for private 
purchase? 

Anyone familiar with what happened to other reliable economic indi- 
cators during the past year would be likely to suspect that this conclu- 
sion is untenable and, consequently, that there is something wrong with 
the arithmetic producing such a result. This, of course, is the case. De- 
fense expenditures and national income are not fully comparable 
aggregates. The one cannot be subtracted from the other and yield a 
significant remainder. Care must be exercised, also, in expressing war 
expenditures as a percentage of national income.! 

This does not mean that either of these figures by itself has not a 
clearly defined meaning but merely that they represent different types 
of aggregate which are incomparable. To anticipate somewhat, the 
reason for this, briefly, is that defense expenditure is (firstly) a sum of 
transactions in the sense that it includes payments other than those for 
goods currently produced, and (secondly) a sum in which the goods and 
services included are valued at prices paid. The national income, on the 
other hand, is (firstly) a net value of current output in which the net is 
defined in a special way, and (secondly) a total in which the valuation 
of output is at costs paid or accruing to the factors of production rather 
than at sales prices to final users. Hence, in order to obtain a meaningful 
comparison between war goods and total output, adjustment is re- 
quired in both war expenditures and national income. The character of 
these adjustments will be shown through an explanation of the various 
items in Table I. It should hardly be necessary to add that the esti- 
mates are preliminary, particularly those for 1941. The Bureau of 
Foreign and Domestic Commerce is engaged in making direct estimates 
of this sort in a thorough and careful way, refining and extending the 
outstanding work of Professor Kuznets in the field of commodity flow 
and capital formation, but this job is as yet incomplete. 

One must begin a discussion of this sort by pointing out that there is 


1 See “Measuring the Economic Impact of Armament Expenditures,” by R. W. Goldsmith, a paper 
presented at the 1941 meetings of the American Statistical Association. 
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no one correct measure of income or output that can be used indiscrim- 
inately in every type of economic problem. Any measure is correct so 
long as it is consistently worked out from a strictly defined concept, 
but the concept itself must be adapted to the analytical purpose in 


TABLE I 
DERIVATION OF GROSS NATIONAL PRODUCT AT MARKET 
PRICES FROM NATIONAL INCOME* 
(Billions of dollars) 











1939 1940 1941 

National income 70.8 77.3 94.7 
Plus: Corporation income, etc., taxes 1.5 2.8 6.9 
Other business taxes 8.1 9.0 10.7 
Accounting depreciation 6.4 6.5 7.0 
Capital outlays charged to current expense .8 1.0 1.8 

Other business reserves 8 9 1.6 
Inventory revaluations —.3 —.4 —3.2 
Equals: Gross national product at market prices 88.1 97.1 119.5 





* Since this paper was written estimates for the years 1929-1941 have been published in the Survey 
of Current Business, May 1942. Reference may be made to that publication for technical notes on the 
estimates. 

Source: Bureau of Foreign and Domestic Commerce. 
view. Problems of taxable capacity, of economic welfare, or of produc- 
tivity all require different measures, even though it is considered 
desirable to call only one concept “the national income” just to avoid 
confusion. The same thing can be said about comparisons of war ex- 
penditures and total output. The concept of total product used must be 
framed according to the purpose for which the comparison is made. A 
definite objective has, therefore, been set here; the derivation of the 
total of consumers’ purchases of market goods and services by the sub- 
traction of war expenditures and any other non-consumer spending 
from total product. A more general usefulness may be served by the 
discussion in pointing out some of the statistical and conceptual diffi- 
culties in this type of estimating. 

As already indicated, the national income cannot properly be used as 
the measure of output from which to deduct war expenditures and 
other non-consumer purchases so as to yield consumers’ purchases, 
even neglecting for the moment the incomparability of the war ex- 
penditures total. The reason for this may be clarified by considering 
what measure of total output would be appropriate. Now, war expendi- 
tures are largely made up of two sorts of purchases of current output. 
In the first place, there are the purchases made by government of goods 
and services produced by private industry. Consequently, the measure 
of total output appropriate to the present purpose must contain the 
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value of the output of private enterprise at final market prices. This 
might be obtained by adding up the sales of each business unit, adjust- 
ing for its change in inventory, and then deducting its purchases from 
other business units. Net sales of all business summated would yield the 
desired value of product for private enterprise. This figure may best be 
thought of as the income from sales that would be shown by a consoli- 
dated income statement for all private enterprises, with adjustment for 
changes in inventory holdings. 

In the second place, war expenditures are utilized to pay for goods 
and services produced directly by government; consequently, the meas- 
ure of aggregate output must include the cost value of government pro- 
duction. This can be obtained by summing the payments made to 
factors of production employed directly by government. When this 
sum is added to the previous total of the value of private enterprise 
output, a total output aggregate would be had from which could be 
subtracted the various categories of non-consumer expenditures. 

The national income, on the other hand, differs quite considerably 
from this measure of gross product. It is made up of the sum of the re- 
turns paid to or accruing to the various factors of production. It con- 
tains already, therefore, the cost value of government production 
specified above. In the sphere of private enterprises, however, while 
the national income contains the preponderance of the unduplicated 
charges against gross revenue as specified above, that is returns to 
factors, it does not contain all such charges and hence is not equal to 
gross revenues. The additions made to national income in Table I are 
designed essentially to arrive at consolidated revenues from sales by 
securing total charges against revenues. It should be pointed out that, 
since the data are not complete, one must be content with building up 
a total which falls a little short of consolidated gross revenues. As there 
is no reason to suspect any consistent bias in this difference, however, 
it can be assumed that the total obtained moves parallel to gross rev- 
enues. 

The principal charges against gross revenues, in addition to returns 
to factors (wages and salaries, net rents, interest, dividends, and undis- 
tributed profits), are business taxes and accounting depreciation and 
depletion. Further, it is appropriate to include also business charges for 
special reserves and for bad debt losses. Consideration may now be 
given to why these adjustments to national income are either necessary 
or desirable, starting with business taxes. 

The national income is defined as the net value of the economic 
goods produced. In giving specific content to this definition, the value 
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of output is taken net of several possible elements of duplication, two 
of which result in making the value too net for our present purpose. It 
is, of course, net of the value of the intermediate products of private 
enterprise used in the production process. Insofar as this means not 
counting the value of the steel, the textiles, and the chromium in addi- 
tion to the final value of the automobile, this is of no concern since the 
result achieved is the same as that obtained from the gross revenues of 
a consolidated income statement. 

But this idea of netness is carried a step farther to exclude the inter- 
mediate products of government—that is, the government services 
rendered to business. Business taxes may be viewed as not included in 
the national income, in part to avoid this duplication. In other words, 
since government services to business are already in the national in- 
come when government output is taken at cost, they may be considered 
as eliminated in measuring the net output of private enterprise by de- 
ducting part of the taxes paid by business from consolidated gross rev- 
enues of business. 

If war expenditures are to be deducted, however, the measure of out- 
put cannot exclude the intermediate products of government, for the 
costs of prosecuting the war are partly a service to business. Should 
intermediate products of government be omitted from the measure of 
aggregate output, what would be left after deducting war expenditures 
might be something less than the total amount of remaining final prod- 
ucts. Similarly, the amount of dollars paid by consumers for the out- 
put of private enterprise is gross of any services government renders to 
business and included in the product purchased. It may be seen that 
the issue hangs not on any implicit classification of government serv- 
ices but on the fact that the services to business are measured by some 
part of business taxes. 

The national income is also net of, what for some purposes might be, 
the duplication of values involved in counting both the market value of 
privately produced goods inflated by taxes on those goods, and the 
government services rendered with the use of those taxes. So long as 
sales, excise, or other taxes are levied which raise prices above factor 
costs, this sort of duplication’ is present, even though no government 
services are assumed to be rendered to business. In part, therefore, 
business taxes may be viewed as not included in national income to 
eliminate this duplication in the value of final output. Because this is 
done, the national income is strictly a measure of the value of net out- 


2 “Duplication” is not a good word in this connection, since for some purposes the inclusion of taxes 
is not “double counting.” 
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put at factor costs, rather than at market prices. While each of these 
measures has its uses, the latter is absolutely required here. This is so 
both because the consumer buys goods at market prices and because 
the privately produced goods purchased with war expenditures are 
bought at market prices. Thus, as neither of the possible reasons for 
leaving business taxes out of the national income applies to the concept 
of output value required here, all business taxes are added to national 
income in Table I. 

The term “business taxes” as used in this connection has nothing to 
do with incidence. It represents merely such taxes as are paid by or 
through business as a matter of administration, regardless of whether 
they are passed on in the form of higher prices or not. 

The estimates of business taxes shown in Table I have two major 
components. The first includes corporate income, excess profits, and 
capital stock taxes. Taxes on personal incomes from unincorporated 
business are not added since they are not excluded from the national 
income. The second component is all other taxes paid by business to 
government units, with the exception of the pay roll taxes paid by 
employers under the Social Security system which are already included 
in the national income estimates under “Other Labor Income.” All 
taxes are taken on an accrual basis since the profit estimates currently 
included in the national income are derived from business income state- 
ments in which profits generally are shown net of accrued taxes rather 
than tax payments. There may be exceptions to this procedure, of 
course, but hardly for tax liabilities which vary significantly from year 
to year or for the taxes affected by the substantial rate increases since 
the start of the rearmament program. 

The next item added to national income in Table I is business 
charges for depreciation and depletion. Obviousiy, the purpose here 
calls for adjusting national income as estimated in the Department of 
Commerce with the actual accounting charges of corporations and with 
analogous estimates, consistent with the character of the profits esti- 
mates, for non-corporate business. If depreciation and depletion 
charges were constant through time, this adjustment would be unneces- 
sary, provided that interest were centered in the changes in consumers’ 
expenditures rather than in their absolute amount. The changes in the 
residual, after the other appropriate adjustments and aside from statis- 
tical errors, would be the changes in the value of consumers’ expendi- 
tures. Depreciation charges today, however, have lost much of their 
stability; a direct comparison of national product at market prices and 
defense expenditures for this purpose is, therefore, an unwarranted 
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procedure which is improperly dignified by calling it “statistical.” 
Today, several factors are making for much higher charges for depre- 
ciation and depletion. These are the change in the depreciation rates to 
be allowed by the Bureau of Internal Revenue whereby defense facili- 
ties may be amortized in 5 years, the recent and prevailing high level 
of capital formation on private account, and the accelerated rate of 
mining output. 

Furthermore, it is thought desirable to use gross national product for 
comparison with war expenditures, rather than net national product, 
if only to emphasize that accounting depreciation charges constitute a 
very inadequate measure of capital consumption. This is particularly 
true in time of war when the unavailability of many types of new equip- 
ment necessarily means that obsolescence of old equipment is slowed 
down considerably. Nonetheless, comparison of war expenditures with 
net national product has its uses and is in no sense incorrect. It can 
serve, for example, to bring into focus the fact that net capital con- 
sumption is an important source of war finance in real terms. 

Addition of “capital outlays charged to current expense” is desirable 
for the same reasons as depreciation and depletion charges. By so doing 
the concept of gross national product is made consistent with gross 
capital formation defined as all investment goods having an average 
life of three years or more. Both concepts are, therefore, made to con- 
form with economic notions of gross investment and freed from the 
vagaries of accounting practice. 

The “other charges and reserves” that have been added to national 
income in Table I contain special emergency and contingency reserves 
and charges for bad debt losses. The special reserves being set up by 
many business concerns because of the uncertainties of the present 
situation must be added because, like taxes, they are covered by sales 
and yet not in the current estimates of returns to factors of production. 
It should, perhaps, be made clear that inclusion of this item raises no 
question about the necessity for the setting up of unusual reserves by 
business management. Many of them are intended to cover anticipated 
losses of foreign assets and may well be too low for the loss eventually 
incurred. Neither capital gains nor losses, however, are included in the 
national income, nor are such charges relevant in deriving a total of 
consumers’ purchases. 

So far as charges for bad debt losses are concerned, to the extent that 
these represent consumer bad debts, goods and services of equivalent 
value actually do reach consumers. They must be added therefore to 
make possible the derivation of the estimate of the sales value of goods 
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passing into the hands of consumers. As for business bad debt charges, 
in the consolidation of the accounts of business enterprises it is the rev- 
enues of sellers gross of charges for business bad debt losses which can- 
cel against the purchases of business buyers.* Hence it is necessary to 
add all bad debt charges to business profits which have been computed 
net of such charges, as well as the other unduplicated items used here, 
to build up the approximation of business consolidated gross revenues. 

The adjustment for revaluation of inventory, also included in Table 
I, is of a different character than the preceding additions to national 
income. It is not a necessary adjustment, providing that the estimate of 
gross private captial formation subtracted later in reducing the value 
of output total to a residual of consumers’ purchases contains the 
change in the book value of inventory rather than the current value of 
the quantity change in inventory. Since many persons are accustomed 
to working with historical inventory data which are adjusted for re- 
valuation, as estimated by Professor Kuznets, their inclusion here only 
serves to emphasize the obvious need for using a current estimate that 
is on the same basis as historical data, whether these be inventory or 
gross national product series. 

With all these items added to the national income, it may be seen 
that there is now an increase of more than 22 billion dollars from 1940 
to 1941 to be distributed among the various types of expenditure as 
against the 17 billion increase in the national income. And this is after 
the inclusion of a negative inventory revaluation estimate of over 3 
billion which significantly affects the distribution of expenditures. This 
figure, which amounted to nearly 120 billion in 1941, has been labelled, 
somewhat hesitantly, “gross national product at market prices,” in the 
hope that the last three words will clearly distinguish it from Professor 
Kuznets’ gross national product concept. It is certainly a grosser 
“gross national product” than that which has become familiar through 
his work. 

It is now possible to proceed with the second part of the problem, 
subtracting the various sorts of non-consumer purchases from current 
gross output so as to leave the desired residual of consumers’ purchases. 
This is shown in Table IT. 

There are only two points to be made in this connection. The first is 


3 The above statement may perhaps be insufficient to confirm the necessity of adding back charges 
for losses on business bad debts. The following example may be helpful. Firm A, selling only to B, 
computes its profits net of bad debt losses. Firm B, buying from A, computes its profits on the basis of 
contract prices for all purchases, not on the basis of the lower value, loss by defaults, actually paid for 
purchases. Hence, the profits of A plus the profits of B plus all other consolidated charges against reve- 
nues except bad debt charges fall short of the revenues of B by the amount of bad debt charges. 
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that the various deductions have to be for the value of goods taken out 
of current output and not for money transactions not influencing cur- 
rent output. The second is that incorrect combinations and compari- 
sons of production and purchases data have to be avoided. From the 


TABLE II 


COMPOSITION OF GROSS NATIONAL EXPENDITURE AT MARKET PRICES 
(Billions of dollars) 











1939 1940 1941 

Gross national expenditure at market prices 88.1 97.1 119.5 
Less: Governiient expenditures for goods and services: 

Defense expenditures 1.4 2.8 13.5 

Prepayments, land, offshore and expenditures, etc. —2.3 

Federal non-defense 5.4 5.2 5.2 

State and local 8.3 8.3 8.2 

Total 15.1 16.3 24.6 

Equals: Private output for private use 73.0 80.8 94.9 

Less: Private gross capital formation: 

Construction: Residential 2.0 2.3 2.7 

Factory and public utility 8 1.1 1.4 

Other 9 1.0 1.1 

Equipment 5.4 6.6 8.9 

Net export of goods and services 8 1.4 9 

Net export of gold and silver —3.2 —4.1 —.6 

Net change in business inventories 8 1.8 3.6 

Net change in monetary stock 3.5 4.5 3.2 

Total 11.0 14.6 19.1 

Equals: Consumers’ purchases of consumption* goods 62.0 66.2 75.8 

Less: Durable commodities 7.1 8.3 10.3 

54.9 57.9 65.5 


Equals: Non-durable goods and services* 





* Residual. 
Source: Bureau of Foreign and Domestic Commerce. 


prevalence of errors on both these scores, some discussion may be justi- 
fied. 

The first item deducted in Table IT is national defense expenditures. 
Used here is the total as shown in the Daily Treasury Statement plus 
changes in the assets of national defense corporations except for changes 
in their cash balances and transfers from other government agencies. It 
will be noted immediately that this total differs considerably from the 
15 billion dollar figure cited earlier because it excludes armaments pur- 
chased in this country by foreign governments. This is the proper pro- 
cedure for a source of expenditure breakdown since foreign purchases 
appear farther down in the table under net change in investment 
abroad. Those thinking in terms of a type of product breakdown should 
keep in mind, obviously, that a higher total for war goods would affect 
a reduction in the net change in investment abroad and not in consum- 
ers’ goods available. The same holds true for any shift from foreign 
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purchasing of armaments to lend-lease shipments—the latter being in- 
cluded in war expenditures but not in net change in claims abroad. 

As it stands, however, the national defense expenditures total cannot 
be deducted from gross national product, because it is a sum of pay- 
ments made rather than a sum of purchases of goods and services out of 
current output. The adjustment shown is introduced to make the two 
figures comparable. It contains, for example, a substantial amount of 
advance payments made to manufacturers holding defense contracts 
for which no goods have as yet been received. The adjustment for pre- 
payments should be net rather than gross, that is, the net outstanding 
payments on which deliveries have not been made as of the end of the 
period in question. At the present stage in the war production program, 
of course, the net prepayments are still a large positive sum which, 
therefore, appears as a negative adjustment to defense expenditures. 

The other principal items of expenditure which have no counterpart 
in current output in the United States and which consequently must be 
included in a negative adjustment are purchases of land or other exist- 
ing assets, except those affecting the inventory estimate, and off-shore 
expenditure for either labor or materials. One might mention, in addi- 
tion, checking accounts set up regionally to facilitate the operations of 
the Quartermasters Corps and minor intergovernment transfers. All 
these adjustments together make quite a substantial sum which can 
lead to significant error when a total of war expenditures is compared 
with or deducted from current output. 

Because all business taxes were included in the computation of the 
value of gross output, total non-defense government expenditures for 
current output are deducted in the next two items. As with net defense 
expenditures, these are substantially different from a total of non-de- 
fense expenditures by the Federal Government and expenditures of 
state and local governments. Budgeted expenditures have been ad- 
justed to eliminate such payments as intergovernmental transfers, 
direct relief, Social Security benefits, veterans’ pensions, purchases of 
land, etc., as none of these appears in the estimate of gross national 
product. It may be mentioned that the output of public service enter- 
prises, such as the post office or publicly-owned utilities, operating out- 
side government budgets are automatically excluded here and, there- 
fore, appear below under the total of private goods for private use. 

Little comment is needed on the various categories of capital forma- 
tion shown in the table except to emphasize again that they are gross 
expenditures and that they are strictly on private account. The inven- 
tory estimate is the book value change adjusted for revaluations, since 
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this adjustment was made in building up the gross national product. It 
represents, that is, the current year market value of the increase in the 
physical stock of inventories.‘ 

With all these subtractions from gross product, there is obtained as a 
residual the measure of consumers’ purchases which has been the ob- 
jective of this calculation. It shows that there was a very substantial in- 
crease in consumers’ purchases in 1941. The estimate could be adjusted 
for the increase in the cost of living to indicate how much more real 


TABLE III 


NATIONAL INCOME BY USE OF FUNDS 
(Billions of dollars) 











1939 1940 1941 

National income 70.8 77.3 94.7 
Pius: Transfer payments from government 2.5 2.7 2.4 
Less: Corporate savings 4 1.3 2.6 
Employment taxes 2.0 2.2 2.4 

Direct personal taxes 2.9 3.0 3.8 

Federal 3.2 1.3 2.1 

State and local 1.7 1.7 1.7 

Equals: Disposable income of individuals 68.0 73.5 88.3 
Less: Consumer expenditures for goods and services 62.0 66.2 75.8 
Equals: Net savings of individuals* 6.0 7.3 12.5 





* Residual. 

Source: Bureau of Foreign and Domestic Commerce. 
goods and services the consumer secured. This would not be the equiv- 
alent of consumption of individuals, however, for there are items of 
such consumpticn included in the government purchases, the most im- 
portant of which to remember at this time being the food, clothing, and 
shelter provided to the armed forces. If one chooses to use the consump- 
tion of individuals as a rough measure of changes in material welfare, 
therefore, the increase in consumption provided out of government 
funds should be included. 

A third table has been added in order to indicate the magnitude of 
personal savings implicit in the consumers’ purchases and national in- 
come totals. This filling out of the picture is useful in analysis of the 
fiscal problem. Little explanation seems required except, perhaps, to 
mention that the estimate of personal taxes is taken on a payments 
basis. This is of some importance in any speculation about changes in 
the propensity to consume. 

Before concluding it may be mentioned that the comparison of war 
expenditures and national product (either net or gross) discussed above 


4 Certain implications of the estimates for the year ahead are discussed by the writer in “War Ex- 
penditures and National Production,” Survey of Current Business, March 1942. 
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is only one type of comparison that can be made. It is the useful type 
when the objective is the tracing of expenditure flows. If, however, one 
is interested in the disposition of economic resources, as must be the 
case in problems concerning the war potential, it is necessary to make 
the comparison of war and non-war output in terms of factor costs. 

It may be seen from the previous discussion that it was inappropriate 
to compare national income with war expenditures directly because the 
national income was, in a sense, too net a figure for the war expendi- 
tures total—even apart from adjustment suggested for war expendi- 
tures. The process of converting national income to gross national prod- 
uct, therefore, was essentially one of increasing the size of the national 
product concept to make it fit the concept implicit in the war expendi- 
tures. Now, it is possible to achieve this comparability the other way 
around; that is, by reducing the war expenditures figure until it is just 
as net as the national income so that both aggregates are in terms of 
factor costs. 

In order to do this one must allocate business taxes between war ex- 
penditures and all non-war expenditures, and then reduce the war ex- 
penditures total by the amount of business taxes associated with it. By 
this means national income and war expenditures would be rendered 
directly comparable, due account being taken of the other adjustments 
previously mentioned. Comparison of this type is implicit in the tables 
on net national income and net national expenditure contained in the 
British white paper on the Sources of War Finance,‘ and the allocation 
of taxes is made directly by Mr. N. Kaldor in his comments on the 
white paper in the Economic Journal for June-September 1941. 

It should be emphasized that this way of handling the problem is not 
a mere difference in methodology; the results achieved can serve dif- 
ferent purposes. Specifically, if the objective is an estimate of the real 
resources in terms of factor costs being devoted to the war effort as 
against those being utilized for civilian purposes, it is essential that the 
taxes implicit in the two categories of expenditures be eliminated. 
Similarly calculation of the war potential in terms of real resources 
must be made ex business taxes. The reason for this, obviously, is that 
there can be no presumption that factor costs are proportionate to 
market prices. 

A few words of caution must be added concerning this use of the 
national income as a measure of the quantity of real resources currently 
utilized. It is subject to severe technica! limitations. In the first place, 


§ An Analysis of the Sources of War Finance and an Estimate of the National Income and Expenditure 
in 1988 and 1940, Cmd. 6261, H. M. Stationery Office, 1941. 
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modern cost theory does not lend itself very readily to any such con- 
cept as the quantity of real resources. Furthermore, there is the differ- 
ence in accounting methods as between government and private busi- 
ness, which accounting records form the basis of estimates of factor 
costs. Moreover, a complication is introduced by the fact that certain 
important elements of income get fixed outside of the market in time of 
war, for example, wages of draftees. Perhaps most of all, it is rather 
difficult in practice to make proper allowance for the fact that perfect 
competition does not rule and that returns to the factors of production 
are far from equal in all industries. Citation of these difficulties is not 
intended as a ban on the use of national income estimates in economic 
planning for war, but only as a contribution toward the best use of a 
highly valuable tool of analysis. 











SAMPLING THEORY WHEN THE SAMPLING- 
UNITS ARE OF UNEQUAL SIZES* 


By W. G. CocHran 
Towa State College 


N SAMPLING, the sampling-units are usually chosen so as to be similar 
| in size and structure. With some types of population, however, it is 
convenient or necessary to use sampling-units that differ in size. Thus 
the farm is often the sampling-unit for collecting agricultural data, 
though farms in the same county may vary in land acreage from a few 
acres to over 1,000 acres. Similarly, when obtaining information about 
sales or prices, the sampling-unit may be a dealer or store, these ranging 
from small to large concerns. 

In such cases the question arises: Should differences between the 
sizes of the sampling-units be ignored or taken into account in selecting 
the sample and in making estimates from the results of the sample? 
This paper contains a preliminary discussion of the problem, though 
further research is needed, many of the results given below being only 
large-sample approximations. It is convenient to consider first the prob- 
lem of estimation, since it appears that the best method of distributing 
the sample depends on the process of estimation that is to be used. 


THE PROBLEM OF ESTIMATION 


To state the problem of estimation in mathematical terms, we as- 
sume that sampling units are drawn at random without regard to their 
sizes, and consider how to estimate the population total of some quan- 
tity y which can be measured on each sampling-unit. Associated with 
each sampling-unit is also a quantity x, which is called its area rather 
than its size, to avoid possible confusion between the terms “size of 
sample” and “size of sampling-unit.” Some knowledge is assumed to 
be available about the values of x in the sample, and possibly also in the 
population.' In order to apply results from the statistical theory of 
estimation, it is also assumed that the number of sampling-units in the 
population may be considered infinite. Formulae applicable to the 


* A paper presented at the 103rd Annusl Meeting of the American Statistical Association in joint 
session with the Institute of Mathematical Statistics, New York, December 30, 1941. 

Journal paper No. J989 of the Iowa Agricultural Experiment Station, Ames, Iowa. Project No. 611. 

1 For some populations an alternative method of specification may be more appropriate. For in- 
stance, each sampling may consist of an integral number of sub-units, as in the case of human popula- 
tions where the sampling-unit is a household and the sub-unit is a single person. The specification may 
be made in terms of the value of y per sub-unit and the number n of sub-units per sampling-unit (cf. 
Hanson and Hurwitz, 1942). Since this approach would not apply in the examples given at the beginning 
of this paper, it will not be considered here. 
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practical situation of sampling from finite populations can be obtained 
by adding suitable correction terms. 

Stated in this way, the problem of estimation is a familiar one in 
mathematical statistics. If the joint frequency distribution of x and y 
in the population is known, the theory of estimation provides a routine 
technique leading to an efficient estimate of the population total of y 
and using to best advantage any available information about z. There 
are, however, difficulties in utilizing this method of approach. The 
joint frequency distribution is often known at best only vaguely from 
the available data, and may not appear to follow any of the few types of 
bivariate frequency distribution that have been studied. Further, 
there are strong administrative arguments for keeping the computa- 
tions involved in making the estimate as simple as possible; these re- 
quirements may impose a bar on the use of estimates which, while 
highly efficient statistically, are rather difficult to compute. 

Both difficulties can be met to some extent by restricting the esti- 
mates to those derived from the regression of y on x. For the calculation 
of regression equations, it is not necessary to describe completely the 
joint frequency distribution of x and y; we need only know how the 
mean value and the variance of y change as x changes. These can be 
examined from a graph or two-way table of the pairs of values of x and 
y constructed from any available data. If the form of the regression line 
and the relative weights assigned to different values of y are correct, 
the regression estimate is a best unbiased linear estimate as defined by 
David and Neyman (1938), though it is not a maximum likelihood 
estimate unless in addition the values of y are normally distributed 
within arrays in which z is fixed. The computations required for the 
simpler types of regression line are well-known and not unduly labori- 
ous. 

ESTIMATES DERIVED FROM LINEAR REGRESSION 


In the following sections it will be assumed that the quantity to be 
estimated is the population total of y; any formulae can easily be al- 
tered so as to refer to the estimation of the population mean per sam- 
pling-unit. 

The simplest case occurs when the mean value of y is linearly related 
to the area of the sampling-unit, with constant variance; i.e. y is of the 
form a+ 8x+e, where e has mean value zero and constant variance in 
arrays in which z is fixed. In this case, the linear regression estimate 
Y, for the population total of y is 


Y,= NUy + b(Z»p — Z.)} (1) 











-SampLine THEORY WHEN Units ARE OF UNEQUAL SIZES 201 


where WN is the number of sampling-units in the population, b is the 
sample regression coefficient S(y—g,) (x— %,)/S(x — ,)*, and the suffixes 
p and s refer to the population and sample respectively. It will be noted 
that this estimate requires a knowledge both of the total number N of 
sampling-units and of the mean value of zx in the population. 

In samples in which the z’s remain fixed, the sampling variance of Y, 
is 

1 

V(Y.) = N%o,7(1 — p?*) {— + (2) 
n being the number of sampling-units in the sample and p the correla- 
tion coefficient between y and z. The distribution of Y; tends to normal- 
ity as n increases, being exactly normal for any size of sample if y is 
normally distributed for fixed z. A sample estimate of this variance is 
obtained by substituting for o,? (1—?) the mean square s,? of deviations 
from the sample regression line. 

For comparison with other estimates we may require the average 
variance of the regression estimate under random sampling. From (2), 
this clearly depends on the form of the frequency distribution of the 
areas. Since the areas are essentially positive, their distribution will not 
in general be normal, except perhaps as an approximation. The mean 
value of (2) may be expanded in a series of inverse powers of n, the 
sample size. Retaining the two leading terms, we obtain 


Na,2(1 — p*) 1 3+ =| 


VY, = L+—+ 





(3) 


n n? 


where 7; is Fisher’s (1941) measure of relative skewness (1? = x3?/x*). 
If the areas were normally distributed, y; would of course be zero, and 
the exact value for the term in curled brackets would be (n—2)/(n—3), 
which agrees with the value given above to this order of approximation. 
With large samples the factor is close to unity. 

In many problems the true regression line must pass through the 
origin, as for example when y represents corn acreage and zx farm acre- 
age. Even in such cases, it may be advisable to use the preceding type of 
regression, if it appears on examination that a straight-line regression 
not passing through the origin will provide a satisfactory fit, whereas it 
would be necessary to use a curvilinear regression in order to include 
the origin. If a straight line through the origin can be used, y being of 
the form (8%+e), with constant residual variance, the regression esti- 
mate Y, (, for origin) of the population total is 








202 AMERICAN STATISTICAL ASSOCIATION - 


S(zy)_ _ 








Fa = N — >> oe 4 
ster 7 ~ 1?) ge - 
where (x) is the population total of the areas. The variance of Y, is 
V(¥o) = {2(x) }%oy*(1 — p*)/S(x*). (5) 


The number of sampling-units in the population does not enter into 
either of these formulae, which require only the population total of the 
areas. 

The expression for the average value of this variance, under repeated 
random sampling, is rather complicated. If the distribution of the areas 
is not far from normal, the leading terms give 


N%o,2(1 — p?) fi * 2c.(2 + == 
n(1 + c;) n(1 + cz)? 


cz=0,7/£,7 being the square of the coefficient of variation of z. 

From formulae (3) and (6), we may compare the sampling errors of 
Y, and Y, with that of the estimate Y, (s for sampling-unit) which is 
obtained by multiplying the sample mean per sampling-unit by the 
total number of sampling-units, and is commonly used where sampling- 
units are equal in size. Since the variance of Y, is N’o,2/n, the ratios of 
the three pairs of variances in large samples are as follows: 


V(Y) - (1 = 2). V(Y.) - (1 = p”) , V(Y.) a 1 : 
V(Y,) """WN.) O+e)’ VY) (1 +e) 


The additional factors involving 1/n and 1/n? have been omitted from 
these expressions; they should be included in practical applications un- 
less they are negligible. In large samples, both regression estimates are 
more accurate than the sample-mean estimate, the gain in accuracy be- 
ing considerable if p is high. As would be expected, Y, is more accurate 
than Y; when the true regression line is straight and passes through the 
origin, the increase in accuracy depending on the coefficient of variation 
of the areas. 

These results must be interpreted with care. They indicate that in 
large samples Y; can never be less accurate, on the average, than Y,,. 
This statement was proved under the assumption that the true regres- 
sion is linear (whether it passes through the origin or not); in the fol- 
lowing section it will be shown to hold substantially even if the true 
regression is not linear. The conclusions about Y, have a much more 
restricted validity, holding only if the true regression is linear and 


(6) 





V(Y.) = 


(7) 
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passes through the origin. If the true regression line passes through the 
point y=a when z is zero, the estimate Y, is biased, the bias tending, in 
large samples to the constant value —Nac,/(1+c,). Including this 
bias in the expression for the sampling error, we have, instead of (6) 


N*a’*c,? N?o,?(1 — p?) 
(1 + cz)? n(1 + cz) 


Since the component arising from the bias does not decrease as the 
sample size n increases, a sample size is always reached beyond which 
both Y,; and Y, are more accurate than Y,, unless a is zero. Thus Y, 
cannot be recommended as an estimate unless it is known with con- 
siderable confidence that the true regression is straight and passes 
through the origin. 


(8) 








V(Y.) 


NON-LINEAR RELATIONS BETWEEN Y AND & 


It has already been pointed out that in many cases the investigator 
possesses only fragmentary knowledge of the true relation between the 
observations y and the areas of the sampling-units. Since a linear re- 
gression estimate may be used without any certainty that the popula- 
tion regression is linear, it is worth examining how Y, is affected when 
the population regression is non-linear. Suppose that the true relation 
is of the form 


y=atprtite (9) 


where as before e is distributed with zero mean and unit variance, in- 
dependently of x, and é is a non-linear function of x. For this reason, it 
may be assumed without loss of generality that has zero mean and 
zero linear correlation with x. Following the usual algebraic develop- 
ment of linear regression, we find that the error of estimate 

_, S(E + e)(z — 4) 


Yi— 2(y) =N {@ + @) + ( — &.) S@ — 4, \ - (10) 





Taking the mean value over all possible samples of n sampling-units, all 
terms become zero except the second term in ¢, whose mean value does 
not vanish on account of the non-linear correlation between £ and z. 
By a technique developed by Fisher (1929), this value can be expressed 
in terms of the semi-invariants «x;; of the joint distribution of £ and z, 
the first two terms of the series being 


N ( — xv 1/22 Ku 
BLY: - 20) =—4 . +— (4S). (11) 


Cz n\ oz Oz 
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Thus the regression estimate is biased, the bias however tending to zero 
as the sample size is increased, since n appears in the denominator of 
(11). The numerator x:2 of the largest term depends essentially on the 
correlation between ~ and 2”, i.e. on the quadratic component of the 
regression of y on 2. 

Formula (3) for the average sampling variance of Y; is also changed, 
but the change affects only the terms in 1/n, 1/n? etc. inside the curled 
brackets, the factor outside the bracket remaining Nc? (1—p)?)/n, 
which in this case is equal to N?(c;*+0,2)/n. Since the bias in Y, 
changes in inverse proportion to the sample size, while the standard 
error of Y; changes inversely as »/n, the bias ultimately becomes negli- 
gible relative to the standard error if the sample is sufficiently large. 

Thus, with samples large enough so that terms in 1/n are negligible, 
the ratio of the variance of Y; to that of the sample-mean estimate Y, 
remains (1— p?) even if the population regression is non-linear. This 
does not of course imply that Y, is an efficient estimate in this case. If 
the correct form of regression line could be fitted, the variance of the 
regression estimate would be reduced, in large samples, to N’c,2/n, as 
compared with N*(o;2+¢,.?)/n for Y;. As would be expected, the rela- 
tive loss of information with Y; depends on the ratio of the variance of 
the “non-linear” component é to the residual variance. 

At least part of loss of accuracy could be recovered by adding terms 
in z’, 2’, etc., to the regression, with a corresponding increase in the 
numerical computations. In order to use such regressions in construct- 
ing the estimate, however, additional population data about z are re- 
quired. For a quadratic regression, for example, we must know both the 
population mean and variance of z to be able to calculate the regression 
adjustments to the sample mean 4,. It is unlikely that these would be 
available without a complete frequency distribution of the population 
by area of sampling-unit. Where such complete information is available, 
there is an alternative method of estimation which will be discussed 
later. 

It was previously remarked that when the population regression is 
linear, an unbiased sample estimate of the variance of Y; is obtained 
by substituting the residual mean square s,? in place of o,7(1—p?) in 
formula (2). With a non-linear population regression, s4* is a biased 
estimate of o,?(1—)*), but again the bias is inversely proportional to n, 
becoming negligible in large samples. 

To summarize, with large samples the linear regression estimate is 
unbiased, and the standard formula gives an unbiased estimate of its 
variance even when the population regression is non-linear. There is, 
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however, a loss of efficiency which remains fixed in large samples. While 
no exact small-sample theory has been reached, it appears that both 
the estimate itself and the estimated variance are biased in small sam- 
ples. 

WEIGHTED REGRESSIONS 


Thus far we have considered the case in which only the mean value 
of y changes as x changes. The variance of y may also change, particu- 
larly so if there is considerable variation in the areas of the sampling- 
units. The theory of regression has been extended to meet this case, 
provided that the ratios of the variances of different values of y are 
known exactly, a condition which rarely if ever holds in problems of this 
type. If the true residual variance of y; is o,7, and w;=1/¢,? the best 
unbiased linear estimate Y,,; is 


Yor = No + buw(Zp — Fu)} (12) 
where Y¥w=S(wiy:)/S(wi), Z0=S(wix;)/S(wi) are weighted sample 


means and b,=Sw,(ti—Zw)(yi—Jw)/Swi(tzi—Z.)? is the weighted 
sample regression coefficient. For a fixed set of x’s the sampling vari- 


ance of Y,,; is 
(Zp = Ey)? 


Yuu = N? 
iia fo Swi(z; — fy)? a 





It will be noticed in (12) that Y,.,; remains unchanged if instead of the 
correct weights w; we use numbers w,’ = Aw; which are proportional to 
the weights; i.e. only the relative weights assigned to different values of 
y need be known in order to calculate Y,,;. Formula (13) for the sam- 
pling variance cannot be used however unless the actual values of the 
weights are known. If only relative weights w,’ are known, an unbiased 
sample estimate of (13) is given by 


Mee, an W; S os # 
i Sw,'(y; xa 1 (Zp — Fu) | (14) 
(n — 2) S(w,;’) Sw,'(z — Fy)? 





S*(Yu1) = N 


where Sw,’(y;— Y;)?/(n—2) is the weighted mean square of deviations 
from the sample regression, using w,’ as weights. 

In practice, before these formulae can be used, it will be necessary to 
estimate the residual variances, and hence the weights, from the results 
of the sample and any other comparable data. Baker (1941) has re- 
cently discussed this problem for the case in which the 2’s fall into a 
number of distinct groups, all z’s having the same value within each 
group. More generally, the z’s will show a continuous range of varia- 
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tion. Space does not permit a detailed investigation of the best proce- 
dure for estimating the weights in this case. If the weights are presumed 
to change continuously as x changes, the first step seems clearly to sub- 
divide the range of variation of zx into a number of groups. The residual 
variance of y within each group can then be estimated by fitting an 
unweighted linear regression of y on x separately for each group. From 
these results, the relation between the residual variance and the area 
can be studied, and a smooth curve drawn to give the variance as a 
function of x. The weight to be assigned to any value of y is then ob- 
tained by noting the area of the sampling-unit, reading the curve, and 
taking the inverse of the variance. 

The greater the number of groups, the more points are available for 
appraising the relation between variance and area. A further advantage 
of having many groups is that if the range of zx is small within the 
groups, the within group correlation between y and x may be negligible, 
so that the total within-group mean square of y may be used as equiva- 
lent to the residual mean square, thus obviating the necessity of fitting 
a regression within each group. However, as the grouping is made finer, 
the number of observations within each group decreases, leading to less 
accurate estimates of the within-group variances. The optimum num- 
ber of groups is not clear without further examination, though at a 
guess it seems advisable to have at least 20 observations in each group. 

The estimated weights are, of course, subject to sampling errors. 
These errors have two consequences. The estimate Y,.,; is not as ac- 
curate as it could have been made if the true weights had been known. 
This loss of accuracy is unavoidable, the somewhat laborious process 
described above for estimating the weights being an attempt to reduce 
the loss to a minimum. Secondly, and somewhat more seriously, both 
formulae (13) and (14) give biased estimates of the sampling variance 
of Y.,; even in large samples, i.e. even ignoring the correction terms of 
order 1/n which have appeared in previous formulae. If w,’ are the 
estimated and w; the true weights, the correct sampling variance of 


Y 1 in large samples appears to be 
) / sw as) 


Wi 
N°S ( 
Wi 
Substituting w,’ for w;, formula (13) gives for large samples 
N?/S(w;’) (16) 
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while (14) gives, on the average 
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w,’ al / 
vis( a ) / nS(w,’). (17) 


All three formulae agree if w;’=w, for all 7, this being the only case in 
which (16) gives the correct result, except by chance. However (17) 
also gives the correct result whenever w,’=)w; for any value of \, and 
in general (17) is less subject to error than (16). Thus the process out- 
lined above for estimating the weights should be regarded as leading 
merely to relative weights, formula (14) being used to estimate the sam- 
pling variance of Y,,;. If some idea can be formed of the probable 
magnitude of the errors in the estimated weights, formulae (15) and 
(17) can be compared to assess whether the estimated sampling error 
of Y~z is likely to be greatly or only slightly biased. 

Formulae (15) and (17) also agree if all the estimated weights w,’ are 
chosen equal, whether the true weights are equal or not. Thus, if we fit 
an unweighted regression, using Y; as the estimate, the formula pre- 
viously given for the estimated sampling variance of Y ; is still unbiased 
in large samples when the true weights vary. Considering how fre- 
quently the unweighted linear regression is used in statistical applica- 
tions, it is reassuring to find that at least in large samples the standard 
formula for the estimated variance remains reliable even if the popu- 
lation regression is non-linear or if the true weights vary. 

In view of the labor involved in estimating weights and fitting a 
weighted regression, it may sometimes be questioned whether the gain 
in accuracy is sufficient te compensate for the extra work, particularly 
so if the true weights do not appear to vary greatly. To obtain some 
idea of the gain in accuracy, we may note from either (15) or (17) that 
if an unweighted regression is used, the variance of Y; is approximately 
N?S(1/w,)/n?. From (13), the maximum possible accuracy attained by 
a weighted estimate is N?/S(w;) to the same order of approximation. 
Thus the relative accuracy of Y; to Y.: cannot be less than approxi- 
mately 





V(Yu) _ n? n? 


= = ° (18) 
V(Y)) 1 1 
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By inserting a series of values of «7 to represent the range of variation 
in a practical case, this formula gives some idea of the relative accuracy 
attained by an unweighted regression. If the true variances do not 
change greatly, a rough approximation to (18) is 1/(1+c,), where 
c, is the coefficient of variation of the variances. Thus, for example, if 
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the variance o,? appears proportional to the area of the sampling-unit, 
the relative accuracy is about 1/(1+c,). 


ESTIMATION BY THE MEAN PER UNIT AREA 


One type of weighted regression leads to an estimate which is particu- 
larly simple to calculate, and has proved serviceable in estimating crop 
acreages in agricultural sampling. If a weighted regression passes 
through the origin, the corresponding estimate Y,,. is 


S oHigi 
y,, = Swed 5 
S(w,x,?) 


Suppose that the variance of y; increases proportionally to the area 
2;;in this case w;=1/kz;, and (19) reduces to 


7 Sys) . 
S(xi) ~ 


Thus to calculate Y, (a for area), the sample total of y is divided by the 
total area of all sampling-units in the sample, giving a mean per unit 
area, which is then multiplied by the total area in the population. From 
the conditions mentioned above, Y, is a best unbiased linear estimate 
if the mean value and the variance of y both change proportionally to z. 
Goldberg (1942) has studied the sampling distribution of Y, for any 
type of joint frequency distribution of y and x. He has shown that in 

general Y, is biased, the leading term in the bias being 
Nip 


n 


(z;). (19) 





Y. (xi). (20) 





(c, — pvVc.ly) (21) 


while the first approximation to the variance is 


>" 
~ (cz + cy — 2pvV/ext,). (22) 


n 


2 





V(Y.) = 


Thus in large samples the ratio of the bias to the standard deviation is 
proportional to 1/+/n. By examining the ratio as a function of p, it may 
be shown that the ratio cannot numerically exceed the coefficient of 
variation of x, divided by Vn. 

Since Y, and Y, are the two simplest estimates to calculate, it is of 
interest to compare their sampling variances. For samples sufficiently 
large so that (22) may be used as the variance of Y., Goldberg (1942) 





has shown that Y, has a smaller variance than Y, whenever p>3+/c,/cy 
and vice versa. 
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ESTIMATION BY USING POPULATION WEIGHTS 


If a complete tabulation of the areas of all sampling-units in the pop- 
ulation is available, the areas can be sub-divided into groups or strata, 
an estimate of the total of y being made for each stratum. While this 
procedure could be carried out with all the estimates previously dis- 
cussed, this investigation will be confined to the simplest estimate Y,. 
If m,... me, Ni, ... N; are the numbers of sampling-units in the sam- 
ple and population respectively for the k groups, the estimate Y,, of the 
population total over all strata is 


You = (Nii +--+ + Nig) (23) 
the sampling variance being 


Ni2o:2  _Ne2os? N,2032 
Ven) = (“= + 2°02 ee =.) 





(24) 





ny Ne Nk 


where o;” . . . o;? are the within-strata variances of y. 

As shown by Neyman (1934), this variance is smallest, for a fixed 
total size of sample, when the sample is distributed amongst the groups 
so that n; is proportional to N,o;. To retain comparability with pre- 
vious estimates, however, we will assume that the sample is chosen at 
random. 

In large samples, the average value of (24) works out approximately 
as 








k Per 2 
re (o1 + + ox ‘t (25) 
n k 


where n and WN are as before the total numbers of sampling-units in the 
sample and population respectively. The expression inside the curled 
brackets contains both a weighted and an unweighted mean of the 
within-strata variances of y. 

If all within-strata variances are the same, this reduces to 


as N?2e? k-1 
V(¥,.) = — (1+ ). (26) 


n 








By increasing the number of z-groups with a given sample, the 
within-group variances are presumably decreased, since that portion of 
the variances of y which is due to variation in z is decreased by cutting 
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down the range of x within each group. However, the factor involving k 
is of course increased as k increases, so that a point is reached beyond 
which a further increase in the number of groups will result in less ac- 
curacy. From (26) it follows that in the case of equal variances the 
factor involving k is relatively unimportant provided that (k—1)/n is 
less than say .05, which holds if the average number of observations per 
group exceeds 20. 

On comparing (26) with (3), Y,. is found to be somewhat less accu- 
rate than the linear regression estimate Y , if the true population regres- 
sion is linear with equal variances. This follows because the within- 
stratum variance o? cannot be less than o,7(1—p?), while the additional 
factor in 1/n is also larger for Y,, than for Y;. This conclusion was to be 
expected, since under the conditions mentioned Y; is a best unbiased 
linear estimate. If however the relation between y and z is markedly 
curvilinear or discontinuous, Y,, may be superior to Y;, since the varia- 
tion in y arising from any type of relation with x can be reduced by a 
suitable choice of strata, whereas Y; eliminates only the effects of the 
linear component of the relationship. Moreover, Y,, is an unbiased 
estimate for any type of relation between y and z and any size of sam- 
ple. Similarly, an unbiased estimate of the variance of Y,, is always ob- 
tained by substituting the sample within-strata mean squares in (24). 

Similar comparisons can be made between Y,, and the weighted 
linear regression estimates by means of the formulae given for the sam- 
pling errors. Goldberg (1942) has discussed briefly the properties of 
Y,a, the corresponding weighted estimate derived from the sample 
mean per unit area within each group. 


FURTHER NOTES 


Some apology is needed for presenting in the previous sections a num- 
ber of large-sample approximations without guidance as to the limits 
within which these apply. Unfortunately these limits depend on the 
form of the joint frequency distribution of x and y, and could not be 
specified more definitely without a classification of the types of fre- 
quency distribution. Moreover, in extensive surveys, where problems of 
organization are difficult, biases may arise through the method of se- 
lecting the sample, incompleteness in the returns, and errors in report- 
ing or recording the data. Such biases, while affecting the accuracy of 
the estimates, may not be measured by the formula for the sampling 
error, so that a rough approximation to the sampling error is often suf- 
ficient for practical purposes. 














-SaAMPLING THEORY WHEN UNITs ARE OF UNEQUAL SIzEs 211 


If the correct form of regression is used, population estimates de- 
rived from regressions remain unbiased in non-random sampling, pro- 
vided that all sampling-units with the same area have an equal chance of 
selection. Thus the large sampling-units might be allotted a greater 
chance of inclusion in the sample, this procedure giving a more accurate 
estimate whenever the variance of y increases as z increases. On the 
other hand, if the method of selection discriminates in favor of certain 
sampling-units amongst those of the same area, bias may arise. 

The formulae in this paper will of course apply to any variable z 
which is correlated with y. For example, in agricultural sampling, where 
the sampling-unit is sometimes a fixed area of land, x may be taken as 
the number of farms in the area, the total farm land or the total crop 
land, according to which gives the highest correlation with y. 

In developing correction terms to be applied where an appreciable 
fraction of the population is sampled, the initial difficulty is that of de- 
fining a regression in a finite population. Writing y=a+fz+e, we may 
suppose that e has no linear correlation with z in the finite population, 
but if we attempt to postulate that e is uncorrelated with any power of 
x, the number of conditions to be satisfied is greater than the number of 
values of e available, so that e and x cannot be independently distribu- 
ted in the sense in which this term is applied with infinite populations. 
An alternative approach is to regard the finite population as a random 
sample from an infinite population in which e and z are independent. 
From a preliminary investigation and from Goldberg’s (1942) work, it 
appears that the first approximation consists in multiplying formulae 
(2), (3) and (22) for the sampling-variances, and formulae (11) and (21) 
for the biases by (N —n)/N, this being the same correction as in the case 
of the sample-mean estimate Y,. In formula (24), each term is multi- 
plied by the corresponding factor (V;:—n;)/N;. For Y., the estimate 
derived from a straightline regression through the origin, and Y.:, the 
weighted linear regression estimate, further investigation is needed. 
The difficulty arises because these two estimates do not equal the true 
population total when the sample consists of the whole finite popula- 
tion; i.e. they are inconsistent in the sense of Fisher (1941) whereas 
Y,, Yi, Ya, and Y,; are always consistent. 


For sampling surveys in which the areas x of the sampling-units are 
unequal, the properties of various estimates of the population total of 
some observed quantity y are discussed, these estimates being mostly 
derived from the regression of y on x. In order of ease of calculation, the 
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estimates are as follows; Y,, derived from the sample mean per sam- 
pling-unit; Y,, derived from the sample mean per unit area; Y,,, a 
weighted form of Y,, using population weights; Y, and Y;, based on un- 
weighted linear regressions; and Y,. and Y,., using weighted regres- 
sions. The conditions under which each estimate is most efficient are 
described, with various comparisons of their relative efficiencies. 
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THE IMPORTANCE OF HOSPITAL MORBIDITY 
DATA FOR THE COMMUNITY* 


By Marta FRAENKEL 
Welfare Council of New York City 


FFICIALS and others responsible for administrative planning have 

had the benefit of a periodic census giving a solid basis of demo- 
graphic and economic data on the total population. In one vast and 
important field, however, they have been forced to plan without this 
basis of statistics. This field is the care of the sick—the provision of 
medical and nursing care and the provision of hospitals. 

Need for information in this field has always been present, but today 
that need is especially urgent. In recent decades the scene has been 
changing: A successful fight against acute infectious disease has pushed 
this group of conditions into the background while an aging population 
has brought chronic diseases more and more into the foreground. 

The volume of disease is large and diseases vary greatly in kind and 
in seriousness; the borderline between sickness and health is not always 
clear cut. The combination of these factors may be the reason that a 
general periodic census of disease by diagnosis has never been launched. 
Knowledge of the occurrence of specified diseases in the population as a 
whole has been restricted to that which can be gained through an 
analysis of the causes of death and through the current data on the 
communicable and occupational diseases that are reportable by law. 

Mortality statistics, it is true, have been developed to a high plane 
but data on the causes of death permit only limited conclusions as to 
the total volume of sickness which calls for medical, nursing or hospital 
care of the patients. 

In the past, special studies have produced a body of interesting and 
useful information. But these studies have been restricted in size or in 
scope. The National Health Survey, by its very size and character, 
cannot be considered as the cornerstone of a routine reporting system 
that might furnish the periodical data necessary to community plans 
for the care of the sick. 

Another approach is necessary. Since the reporting, as a routine pro- 
cedure, of all illness in the population appears to be too difficult both 
on account of its vastness and its inclusion of ill-defined minor condi- 
tions, the diagnostic data on hospitalized patients are suggested as a 
regular source of information, especially in large urban areas. 


* A paper presented at the 103rd Annual Meeting of The American Statistical Association, New 
York, December 29, 1941. 
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It is admitted that data on hospitalized morbidity are a compromise. 
Is this compromise acceptable? This question may be answered in the 
affirmative if the limits of the information made available are perma- 
nently kept in mind. Data on hospitalized morbidity reflect an un- 
known proportion of the total volume of illness, but it is safe to assume 
that they would afford a fairly complete record of the major diseases. 
Most of the minor conditions will, however, be excluded. One might 
wonder whether their inclusion would really prove an asset. Might not 
the enormous volume of minor illnesses and the absence of precise 
definition of such illnesses confuse the picture? 

The collection from hospitals of morbidity data classified by disease 
should not meet insurmountable difficulties. These data would have to 
be recorded and collected by each hospital in accordance with standard 
rules and then to be forwarded to a central collecting agency. Since, for 
many years, some information has been thus reported, the major in- 
novation would be the inclusion of the patients’ medical diagnoses, re- 
corded in accordance with a suggested classified list of diagnoses.’ In 
suggesting the collection of data on hospitalized patients, the sources 
of morbidity data to a known and not too high number is limited. The 
record rooms of the hospitals are professionally directed by librarians 
accustomed to similar work and no doubt able to take up the additional 
task in a competent way. 

The volume of hospitalized sickness without specification by diag- 
nosis has been known for some years: the municipal hospitals report 
to their central administration; the voluntary hospitals, if they provide 
care to patients who are public charges, have to report to public welfare 
departments, and if they participate in a community drive for the 
support of voluntary hospitals, to the.fund raising agency. Only pro- 
prietary hospitals are exempt from the obligation of any central re- 
porting. 

The a priori assumption that the hospitals’ own reports might serve 
as a source of information on diseases of patients receiving hospital care, 
is for the present not correct. Most of the annual reports focus not on 
the patients treated for specific diagnoses, but on the service rendered 
by the several departments of the institution. In New York City, where 
the Department of Hospitals maintains a well-developed statistical 
service, the reports of the municipal hospitals include data on diag- 


noses. 
A demonstration project, the Hospital Discharge Study, was under- 


1A Classified List of Diagnoses for Hospital Morbidity Reporting, Welfare Council Publications 
1939, Volume IX, Research Bureau, Welfare Council of New York City. 
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taken by the Work Projects Administration? and the Research Bureau 
of the Welfare Council of New York City, to explore the possibilities 
of hospital morbidity data with specification by diagnosis and the feasi- 
bility of their collection. This study, made under the direction of Dr. 
Neva R. Deardorff, covered 576,623 discharges, during the year 1933, 
from 113 of the 134 hospitals then in existence in New York City. The 
information secured through the tabulation and analysis of these rec- 
ords is used to illustrate some of the comments which follow. 

The two main elements determining the need for hospital care in a 
community are variable; that is, the proportion of patients suffering 
from specific diseases is not constant and the need for hospital care of 
patients with these diseases is changing. 

The changes in the disease picture of a community mainly result 
from the changing composition of the population: A decrease in the 
birth rate and an increase in the average age are the main factors, with 
occupational changes being a third. In 1900, 30.7 per cent of the popu- 
lation in New York City were under 15 years of age and 2.8 per cent 
over 65; in 1940 the corresponding figures were 19.8 and 5.5 per cent, 
respectively, according to the data of the Bureau of Vital Statistics 
of the New York City Department of Healt] 4 

What does such a change in the age composition of the population 
mean in terms of disease? Certain communicable diseases, such as 
scarlet fever, measles and diphtheria, are recognized to be the classic 
childhood diseases. In addition, such conditions ss, for instance, 
rheumatic fever, mastoiditis and otitis media, are most common among 
the younger age groups. The Hospital Discharge Study, referred to 
above, shows for New York City among the two age groups of early 
life (under 5 years and 5 to 14 years), 352 and 199 discharges with the 
diagnosis of mastoiditis and 356 and 79 with otitis media per 100,000 
of the population in these age groups. These acute conditions have a 
very small incidence in old age, namely, 24 and 17, respectively, per 
100,000 persons of 65 years and over. 

Among hospital patients of 65 years of age and over there were per 
100,000 population of this age group 1,049 discharges with the diagnosis 
of malignant neoplasm, 798 with cerebral hemorrhage, 2,535 with car- 
diac disorders and 2,110 with vascular conditions. These few examples 


2 Work Projects Administration of the City of New York, O. P. No. 65-1-97-21 W. P. 6. 

* Hospital Discharge Study, An analysis of 576,623 patients discharged from hospitals in New York 
City in 1933, by Neva R. Deardorff, Ph.D., and Marta Fraenkel, M.D., Volume I, “Hospitals and 
Hospital Patients in New York City” (in press), Volume II, “Hospitalized Illness in New York City,” 
and Volume III, “Reporting of Illness by Hospitals” (in preparation). 

‘Quarterly Bulletin, Department of Health, City of New York, Volume IX, No. 3, 1941. 
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may suffice to show the changes in the disease picture of a community 
which parallel changes in the age composition of the population. Cur- 
rent knowledge of the prevalence of diseases within given age groups 
is the first step for a proper preparation to meet the need for care. 

The second inconstant factor in the picture is the changing need for 
hospital bed care for particular diseases. The demand for hospital beds 
may be affected by new therapeutic measures; the switch from surgical 
intervention to radiation in the treatment of malignant neoplasms of 
selected sites, for instance, has reduced the amount of hospital bed care 
needed by cancer patients. To forecast changes in the volume of hos- 
pital care needed, knowledge of the occurrence of the underlying dis- 
eases is a prerequisite. 

Changes in the need for hospital bed care may also result from the 
adoption of new policies. Hospital bed care is only one method of caring 
for sick people. It is not always the most appropriate one; it is ex- 
pensive and patients, especially old, chronically sick persons, are fre- 
quently unwilling to stay in the hospital for extended periods. Care in 
institutions such as homes for the chronically sick, homes for the infirm 
aged, or convalescent homes is more economic to the community and 
often is preferred by the patient. In many instances, institutional care 
of patients with chronic diseases or of convalescent patients can be pre- 
vented or curtailed if social work planning based on the knowledge of 
the existing demand provides an adequate supply of extra-institutional 
care. Visiting nurse services and visiting housekeeper services, supple- 
mented by medical supervision, may be the tools used in such a pro- 
gram. But any planning for these services must be based upon knowl- 
edge of the occurrence and distribution of the underlying diseases. 

What type of information can be expected from data on hospital 
morbidity? Five problems on which the study of hospital discharge 
data threw some light may be cited as examples. 

First: What are the diseases found most frequently among hospital 
patients? 

Obstetrical work, which is not a fight against disease but constructive 
health work, according to the data of the Hospital Discharge Study, ac- 
counts for a larger share of New York City hospital patients than any 
other condition. Among the 576,623 discharges, the 59,684 delivered 
women and the 58,507 live births amounted to 20.5 per cent of the 
total discharges. 

Tonsil conditions with 68,384 discharges, or 11.9 per cent of the total, 
were the second largest single item. Approximately 17,100 cases, or 3.0 
per cent of the total, were diagnosed as acute appendicitis; 16,008 cases, 
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or 2.8 per cent, as tuberculosis, and 10,362 cases, or 1.8 per cent, as 
inguinal hernia. 

The second question is: What is the occurrence of various diseases 
when the data are analyzed by sex, age and color of the hospital pa- 
tients in relation to the total population of the same sex, age and color? 

If the number of female patients whose hospital stay was caused by 
childbirth are eliminated from the figures, male hospital patients form a 
slightly higher proportion of the total than did the females, namely, 
53.8 per cent. This is caused by the high occurrence of preponderantly 
“male conditions”: 68 per cent of the 21,302 patients with fractures, 
79 per cent of the 13,424 with hernias and 86 per cent of the 15,576 
with alcohol poisonings were male. 

The importance of the age data has already been stressed above. 

The distribution of the discharges by color shows 71 white hospital 
patients per 100,000 white population, but 126 Negro hospital patients 
for the corresponding population group. Tuberculosis and venereal 
diseases contribute to the excess of hospitalized Negro patients with 
lo.., 31.8 (syphilis)’and 22.1 (gonorrhea) per cent of the total cases 
respectively. Ulcer, scarlet fever and diabetes mellitus are some of the 
diseases for which Negroes are less frequently hospitalized; Negroes 
constitute 3.8, 4.8 and 5.0 per cent of the total cases respectively. 

How greatly does the length of stay in a hospital vary with the par- 
ticular disease? This third question is important to any planning. An 
.oulmate of the number of hospital beds needed must be based on the 
distribution of the days’ care needed by patients with various condi- 
tions. Long as well as short hospital stays were caused by acute as well 
as by chronic conditions. 

The data, moreover, show to what extent the amount of hospital care 
needed by patients with a given diagnosis depends upon economic fac- 
tors. For some diseases, patients can be discharged at an earlier period of 
convalescence from voluntary hospitals than from municipal hospitals; 
for instance, 75.6 per cent of the acute appendicitis patients were dis- 
charged from voluntary hospitals after a stay of 8 to 14 days but only 
44.2 per cent of those in municipal hospitals were discharged so early. 

What is the mortality rate among hospital patients, is a fourth 
question. 

1ue mortality in hospitals of patients covered by the Hospital Dis- 
charge Study was 5.6 per cent of the total. This rate for all diagnoses 
hides rates for specific diseases which range from 0.7 for obstetrical and 
some gynecological conditions to 49.8 per cent for septicemia and 60.9 
per cent for peritonitis. The deaths in hospitals cover only 42.2 per cent 
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of the total cardiac deaths in the city but 100.0 per cent of the deaths 
caused by epidemic encephalitis. 

Planning for adequate medical and nursing care should take into 
consideration the fact that patients with some diseases are transferred 
to hospital care in an extreme condition and that those with other dis- 
eases often stay in the hospital for weeks and months until death comes. 

Fifth and last: Can hospital facilities be planned in accordance with 
the population of an area? 

The planning for certain community facilities such as schools, 
churches and recreational facilities, is based on the residence of the pre- 
sumptive clientele. It might be expected that planning for hospital care 
follows a similar pattern, that the “neighborhood hospital” should be 
the rule. The Hospital Discharge Study shows, however, certain un- 
expected facts: People do not necessarily go to the hospital located close 
to their residence; even if the neighborhood provides sufficient hospital 
facilities, they migrate on a city-wide scale. In New York City not only 
neighborhood but even borough boundaries are disregarded. Migration 
of this type is more common among the patients of voluntary hospitals 
than among those in municipal hospitals, but it exists in the latter 
group too. The fact that personal reasons and not the availability of 
proper local hospital accommodations influence a patient’s choice of a 
hospital is best illustrated by the following example: Health area 59 
in Manhattan has within its boundaries six general and one special 
hospital, some of them of a nation-wide reputation, with a total of 
1,345 beds. The area has a population of 14,256 persons, 2,748 of whom 
needed hospital care during the year studied. These 2,748 persons went 
to 73 hospitals all over New York vi only 665 of them went to the 
seven institutions if the area. 

Planning of health facilities for the population in the future will have 
to consider the hospital no longer as an institution per se, but adminis- 
tratively, as a link in a chain of health services, and, geographically, as 
but one part, though an important part, of the regional health service. 
If and when plans, which coordinate jinstitutional bed care, out- 
patient care, health education and other weapons of sickness preven- 
tion, are realized, the somewhat chaotic flow of patients will cease and 
give place to organized care on regional principles. The first condition 
for a successful planning of the health service of the future is the knowl- 
edge of the strength of the enemy one will have to face, that is, the 
occurrence of diseases in a given population. 

In summary, regularly reported data on the major causes of illness in 
a community are important: 
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(a) To show the changing need for medical, nursing and hospital care, 
resulting from improved methods in preventing and curing diseases, 
and from the changing age composition of the population; 

(b) To show the needs of groups of the population characterized by 
special demographic and economic features; and 

(c) To aid with the preparation and maintenance of a master plan 
of medical, nursing and social care for the sick in a community, a need 
that becomes more and more urgent since chronic conditions, with long- 
range and complex needs steadily gain in importance in the total dis- 
ease picture of the population. 








A CRITICAL APPRAISAL OF BUSINESS STATISTICS* 


By Joun S. Perkins, Assistant to the President 
Boston University 


NbDustTRY has long followed Walter S. Gifford’s dictum that “Business 

management is based upon facts.” And the “facts” which the ordi- 
nary business executive is concerned with can be described as ninety 
per cent internal and ten per cent external. Teachers and writers on 
business statistics, on the other hand, have not yet come around to this 
understanding of the role which statistics play in the actual manage- 
ment of a business firm. In contrast with the business executive’s point 
of view, the statistics and data that teachers and writers deal with can 
be described as ninety per cent external and ten per cent internal. Thus, 
it can be said that business statistics is not now fully meeting the 
business man’s specialized needs in the statistical field. 

For some time there has been a growing recognition in academic 
circles that the ordinary business statistics course does not bridge the 
gap between statistical technique and the actual use of statistics in 
business. Professor Theodore H. Brown of Harvard University warned 
against the disparity between the “preaching” and “practicing” of busi- 
ness statistics at a meeting of the American Statistical Association in 
December, 1936. On this occasion he stated that business statistics 
courses have been of little assistance in solving the specific problems 
that are actually faced by the average business man or by the young 
man in the position in which he will start his business career.! 

More recently there have been other indications that this shortcom- 
ing is being recognized. Writing in the 1940 Summer Number of the 
Harvard Business Review, in a penetrating and sagacious article, “Sta- 
tistics Takes a Second Breath,” Professor Charles A. Bliss reported the 
result of his analysis of seventeen recent statistical textbooks, concern- 
ing which he stated, “It is very doubtful if such an array of outstanding 
(statistical) books, ever appeared in so short a period before.” In his 
analysis of the contents of these books, Professor Bliss took pains to 
stress the significant fact that in only one book, Riggleman and Frisbie, 
Business Statistics (McGraw-Hill Co., New York, 1938) is material 
presented relating to actual statistical practices involved in the internal 
management of an individual business firm. 

Another indication was noted at a discussion meeting of teachers of 


* A paper presented at the 103rd Annual Meeting of the American Statistical Association, New York, 


December 27, 1941. 
1 This JournAL, Vol. 34, No. 206, June 1939, pp. 299 and 302. 
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statistics held during the Annual Meetings of the American Statistical 
Association in Philadelphia in December, 1939. Here the question was 
raised as to whether or not more attention should be given to internal 
statistics in college business statistics courses. According to the testi- 
mony of the representative group of teachers present at this meeting, 
the use and application of statistics in actual business management are 
seldom, if at all, incorporated in the average business statistics course. 
In reporting on the conference in the Bulletin of the American Statisti- 
cal Association (May, 1939) Professor J. R. Stockton of Texas Uni- 
versity summarized the opinion of those present when he said that 
“This represents an important omission in the offerings in strictly 
business statistics.” 

Here is a rich field awaiting cultivation. It is related to statistical and 
accounting techniques on the one hand, and to management theory and 
practice on the other hand. The area can most properly be called 
“Management Statistics.” 

It covers that field of statistical activity involving the analysis and 
use of internal operating data in specific departments and functions of 
individual business establishments. It is primarily the technique of 
collecting and analyzing management facts for the purpose of solving 
and controlling management problems. 

Business statistics courses are not filling the need for training in this 
field. At its best, the ordinary business statistics course is an adaptation 
of the more simple and elementary statistical techniques to a pseudo- 
business setting based largely on general business and economic data. 
The average business statistics course now conducted provides no in- 
sight into the application of statistical and fact-finding techniques and 
principles in actual management practice. It bears little relationship to 
“statistics in business.” College students in business administration go 
out with little or no appreciation of the nature, and use, and value of 
management facts in the actual internal administration of an enter- 
prise. The statistical profession as a whole, statisticians, teachers, and 
writers, have virtually ignored the entire field. 

The development of the field of Management Statistics will have to 
start from scratch. Teachers will have to prepare themselves in the 
field. Problem material must be developed. Literature in the field must 
be encouraged, and textbooks and course syllabi prepared. And pre- 
ceding all this will have to come a framework, or classification, which 
can be used in sorting out and placing in logical position the various 
principles and practices as they are uncovered and explained. 

The most important requisite is teaching leadership. To a large ex- 
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tent, teachers of business statistics have been unaware of Management 
Statistics simply because they have failed to keep in touch with the 
actual role played by statistics in business. They have been smoothing 
out curves of hypothetical business indices, while the business man has 
been trying to smooth out curves of machine loads, price lines, inven- 
tories, and the like. The result is that the student of so-called business 
statistics receives neither fish nor fowl. The statistical theory he gets 
is superficial, and the application of statistics to internal management 
is virtually non-existent. 

The obvious answer is that business statistics teachers must re-estab- 
lish contact with store and factory, with office and warehouse. Further- 
more, this contact must be of a special nature. An occasional plant visit 
is not enough. They must go into the various departments of a business, 
see how the data originate, understand the units of measure and the 
technique of recording the information, and then follow them through 
various levels of authority and use and detail until they emerge in the 
form of a control report for top management. Because one is dealing 
here with the very nervous system of an individual business and with 
the confidential data which this implies, no mere speaking acquaintance 
with industry will suffice. There must be a real merging of interests and 
desires on the part of business management and teachers of statistics 
based on mutual understanding and desire to contribute something 
of value. 

The business man himself should help if the field of Management 
Statistics is to be adequately developed. His most important contribu- 
tion is to cooperate with the academic and professional world in supply- 
ing facilities for basic investigations of Management Statistics. 

He can also cooperate by helping to determine the direction in which 
attention should be given to the field. Every executive worth his salt in 
business today must rely substantially on statistical records, reports, 
and analyses for knowledge of his own business and as a-basis for his 
plans and programs. The business man knows best his needs for control 
data and how the facts are gathered and analyzed and presented. For 
these reasons, he can be of considerable value and help to the research 
worker in analyzing this management nervous system to find out what 
it consists of, how it operates, and how it can be most effectively used. 

The development of literature in the field is an important need. Of 
the many different categories of Management Statistics, market sta- 
tistics, and sales forecasting and, to a lesser extent, personnel statistics, 
are the only types covered in present-day business statistics courses and 
texts with any degree of completeness. In accounting literature there 
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exists much of value dealing with that important branch of Manage- 
ment Statistics related to budget and cost statistics and reports. How- 
ever, until the many aspects of Management Statistics can be brought 
together in one literature, little progress can be made in formulating a 
body of theory and practice in this field. 

Management Statistics should be recognized now as a definite branch 
of the statistical profession. This field represents one of the most im- 
portant techniques in present-day business management. 

The modern business man bases his decisions on facts. In one sense 
his staff organization exists for the sole purpose of supplying these facts 
in usable form. Without a system of organized statistics and reports, 
management would be paralyzed. There could be little delegation of 
authority without Management Statistics. In fact, the effectiveness of 
a firm is largely dependent on the choice and use of management sta- 
tistics and analyses which keep the executive advised as to his course 
and his route. Yet little recognition has ever been given in professional 
and academic circles to this important field. 

Full understanding and utilization of Management Statistics is es- 
pecially necessary in a period of intense business activity and expanding 
operations. Under such conditions, personal oversight of the many and 
varied aspects of management is next to impossible. Decisions must be 
made quickly and accurately and it is absolutely essential that the 
executive have all the facts at his command, properly analyzed, inter- 
preted, and presented. 

From a vocational standpoint, training in the nature and scope of 
Management Statistics would pay almost immediate dividends. Most 
college graduates starting out in business must work to a considerable 
degree with records and reports dealing with various phases of a busi- 
ness. A person who has viewed the entire field of Management Sta- 
tistics and is familiar with the nature and use of management records 
and reports is better qualified right from the start to display that added 
ability which marks him from the rest, thus laying the foundation for 
future executive success. One of the best ways of getting ahead in any 
organization is to display the ability to handle figures well, and to ob- 
tain and present effectively information on any type of management 
problem. After the basic traits of knowledge and judgment, this ability 
depends largely on a familiarity with Management Statistics. 

Furthermore, attention to the field of Management Statistics would 
improve the use and value of management statistics themselves. There 
are instances in almost every company where reports and statistics are 
compiled in great detail by statistical departments without any one 
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finding any use for the figures thus prepared. Mr. Emil Hoofsos long 
ago summed up the situation by saying that, “Unfortunately, in too 
many businesses, the executives are not statisticians, and the statis- 
ticians are not business men.”? 

Business executives themselves have not even begun to utilize the 
full potentialities of Management Statistics as instruments of adminis- 
tration and control. Of course, management statistics and reports do 
not take the place of ability and judgment, but they are vital require- 
ments for the effective use of ability and judgment. They are the 
methods of communication, the measures, and to a large extent, the 
means of interpreting happenings without which ability and judgment 
would be rendered impotent. 

An automobile will not get one to a destination by itself. Only when 
the vehicle is used properly will it proceed to its goal. And the better 
the vehicle and the better trained the chauffeur to operate it, the more 
pleasant, safer, and surer will be the journey. By the same token, a 
sales analysis, or a production report, or a budget statement lying on 
an executive’s desk, will not add one iota to the effective management 
of a firm. Only when the facts thus disclosed are used by the executive 
will an advantage be gained. And the advantage will be greater by the 
same measure that the report is effectively prepared and that the execu- 
tive knows how to obtain the utmost value from the data presented 


in the report. 


2 “Advertising and Selling,” Vol. VIII, No. 5, pp. 24, December 29, 1926. (Also cited in Saunders 
and Anderson, Business Reports, 1st Ed., McGraw-Hill, New York, 1929.) 





























THE MYTH OF THE SECURITY AFFILIATE 


By Greorce W. Epwarps 
College of the City of New York 


HE DECADE Of the thirties witnessed the legal divorce of investment 

from commercial banking. The Banking Act of 1933 forbade in- 
corporated commercial banks from operating so-called investment 
affiliates and forced private banks to abandon either their deposit or 
their security operations. The Act sought to convert the function of 
American banking from an integrated to a specialized type. 

This legislation rested on the belief that financial institutions com- 
bining both commercial and investment operations were against public 
interest. The hearings conducted by Congressional committees dis- 
closed irregularities and abuses by the affiliates and by the private 
banks extending both investment and commercial credit. Subsequent 
financial literature, using the hearings as source material, has uniformly 
condemned integrated banking. In consequence this type is anathema 
to the general public. And yet the hearings did not face the funda- 
mental issue of comparing the securities sponsored by integrated in- 
vestment banks as against those floated by specialized investment 
banks. Save for a fragmentary table, furnished by a bank and not by 
the government, the hearings! did not even show the classes of securities 
floated by the various types of investment houses. The hearings pre- 
sented no statistical analysis of the relative market results of the se- 
curities sold to the public by the non-affiliates as compared with those 
sponsored by the affiliates or by the private banks. 

This article? undertakes a comparative analysis of the domestic cor- 
porate bonds floated by the various types of investment banking insti- 
tutions. The analysis includes 2,633 bonds with a par value of 19,521 
million dollars which provide a comprehensive base for deriving con- 
clusions. These were all the bonds outstanding at the end of 1936 on 
which complete data could be obtained.* These bonds are studied in 
relation to the various types of sponsoring investment banks. They are 
first grouped according as they are (1) non-affiliates or those performing 
no commercial banking operations, (2) affiliates or those structurally 
related to incorporated commercial banks, and (3) private houses or 

1 Hearings on the Operation of National and Federal Reserve Banking System, Part 2, pp. 298-°° 
United States Banking and Currency Committee (Senate) 71st Congress, 3rd Session. 

2 The conclusions in this study do not apply to shares, foreign corporate or domestic or foreign 
government bonds floated by the investment banks. 


8 This analysis is part of a larger study begun early in 1938, and the end of 1936 was the latest 
date for which annual figures were available. The study omitted all bonds issued after the Banking Act of 


1933 went into operation. 
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unincorporated houses engaged in both deposit and security banking. 
These banks are further classified by size, as large and small. The large 
banks include all the major investment houses in the period before the 
passage of the Banking Act of 1933, as listed by the Wall Street Journal 
in its compilation of security flotations for the years from 1927 to 1932 
inclusive. The small banks include all the remaining financial insti- 
tutions. 
RELATIVE IMPORTANCE OF INTEGRATED AND 
SPECIALIZED INVESTMENT BANKING 


The greater part of the corporate bonds was sponsored by non- 
affiliates (Table I). They floated two-thirds of the amount of bonds, 
private banks a fifth, while the affiliates accounted for less than a 


TABLE I 


AMOUNT OF CORPORATE BONDS FLOATED BY INVESTMENT BANKS 
(Per cent of total amount of bonds) 


























Total 
Investment bank Large Small Total 
Million dollars 
Non-affiliate 50.01 15.67 65.68 $12,819.5 
Affiliate 11.26 2.40 13.66 2,664.5 
Private bank 19.90 0.76 20.66 4,037.3 
Total 81.17 18.83 100.00 $19 521.3 








seventh. The affiliate was therefore the least important institution in 
the new corporate bond market. 

The large houses dominated the corporate bond market. They ac- 
counted for over four-fifths of the total amount of bonds in the study. 
The large banks were far more important than the small banks in all 
three groups of houses. The large banks accounted for almost all the 
issues floated by the private banks, but were relatively less important 
in the case of the non-affiliates. 

There were certain interesting variations from these total figures 
when the bonds were separated according to class of issuer (Table II). 
Bonds may be conveniently grouped according to issuer, as utility, rail, 
industrial, real estate and financial. In the utility fields the non-affiliates 
were especially important. They floated over three-quarters of the 
electric light, natural gas, water and utility holding company bonds. 
On the other hand, compared to their average for bonds as a whole, 
the affiliates had a high proportion in the weak traction and toll fields. 
The private banks were important only in the strong telephone field, 


‘Wall Street Journal, March 22, 1933, p. 10. 
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where they sponsored over half of the financing. The private banks also 
had more than their proportionate share of railroad financing. They of- 
fered over 70 per cent of the terminal bonds and over 28 per cent of the 
general railroad obligations. The equipment bonds, due probably to the 


TABLE II 


DISTRIBUTION OF CLASSES OF BONDS ACCORDING TO INVESTMENT BANKS 
(Per cent of amount of bonds in each class) 
























































Non-affiliate Affiliate Private bank Total | Total 
amount 
Class of issuer 
Large Small Large Small | Large Small | Per cent yy 
Utility | 
Electric light 60.5 15.5 10.8 17 9.8 1.7 100.0 | $4,326.5 
Natural gas 28.7 50.9 16.7 3.7 — 100.0 158.3 
Telephone 30.1 5.7 7.3 1.0 55.9 -- 100.0 899.9 
Toll 46.9 26.1 13.3 11.0 2.7 = 100.0 63.8 
Traction 25.5 7.6 27.8 6.1 28.8 4.2 100.0 655.0 
Water 51.1 41.5 4.5 2.9 — — 100.0 219.8 
Holding 75.1 10.2 10.7 3.5 —_ 0.5 100.0 1,562.4 
Total 56.0 14.2 11.8 2.5 14.1 1.4 100.0 $7 ,886.3 
Railroad 
General 48.8 12.1 8.8 1.8 28.4 0.1 100.0 | $7,968.0 
Equipment 31.7 14.2 26.2 0.4 27.5 = 100.0 461.8 
Terminal 11.8 15.6 0.9 0.3 71.4 _ 100.0 327.9 
Total 46.6 12.3 9.4 1.6 30.0 0.1 100.0 $8 , 757.5 
Industrial 
Manufacture §2.9 15.4 24.6 3.3 3.4 0.4 100.0 | $1,231.3 
Extractive 36.2 21.5 18.8 4.1 19.1 0.3 100.0 344.1 
Service 57.0 36.9 4.1 1.7 0.3 _— 100.0 285.2 
Trade 26.1 40.7 0.5 15.5 17.2 —_ 100.0 59.4 
Total 49.7 20.4 19.8 3.6 6.2 0.3 100.0 | $1,920.0 
Real estate 21.5 62.2 7.2 8.5 0.6 _ 100.0 $ 654.4 
Financial 
Finance company 35.0 58.1 1.5 5.4 _ _ 100.0 | $ 33.1 
Investment trust 65.4 15.5 4.7 — 1.6 12.8 100.0 177.0 
Total 60.6 22.2 4.2 | 0.9 1.4 10.7 100.0 | $ 210.1 











fact that they were sold through competitive bidding, were more evenly 
distributed among the various types of investment houses. The non- 
affiliated banks financed about two-thirds of the general railroad bonds. 
Of the industrial bonds, the affitiates financed more than their general 
average, but the private banks less tian their general average. The 
former financed almost a quarter of these bonds, and the latter only 
6 per cent. The affiliates were important in the manufacturing field, 
while the private banks accounted for approximately a fifth of the ex- 
tractive and the trade issues. The small non-affiliated banks completely 
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dominated the uncertain real estate, finance company and investment 
trust bonds. 

There were also significant variations in the distribution of the sev- 
eral classes of bonds, according to the large and the small houses. In the 
utility field the large houses accounted for an unusually heavy pro- 
portion of the electric light, telephone, traction and holding company 


TABLE III 


CLASSES OF BONDS FLOATED BY INVESTMENT BANKS 
(Per cent of total amount of bonds) 





























Non-affiliate Affiliate Private bank 
Class of issuer 
Large Small Large Small Large Small 

Utility 

Electric light 27.0 22.0 21.4 15.8 11.0 49.0 

Natural gas 0.5 2.6 1.2 1.2 — — 

Telephone 2.8 1.8 3.0 1.8 13.1 —- 

Toll 0.3 0.6 0.4 1.4 0.1 _ 

Traction we 1.6 8.3 8.6 4.9 18.6 

Water 1.2 3.0 0.5 2.0 ~— a 

Holding 12.1 5.2 7.6 ey _— 5.8 
Railroad 

General 40.0 31.6 32.0 30.3 58.5 7.2 

Equipment 1.5 2.2 5.6 0.3 3.3 — 

Terminal 0.4 7 0.2 0.2 6.0 _- 
Industrial 

Manufacture 6.7 6.2 13.9 8.5 1.0 3.4 

Extractive 1.3 2.4 2.9 3.0 1.7 0.6 

Service ee 3.4 0.5 1.0 =e ~ 

Trade 0.2 0.8 0.01 2.0 0.2 —_— 
Real estate 1.4 13.4 $.3 11.8 0.1 _ 
Financial 

Finance company 0.1 0.6 _ 0.4 _— —_ 

Investment trust ey 0.9 0.4 — 0.1 15.4 
Total—per cent | 100.0 | 100.0 | 100.0 100.0 100.0 100.0 
Total—amount \$9,715.7 $3 ,045.0 \$2, 187.9 $465.8 $3 , 866.9 $147.0 

(million dollars) | | | 











bonds. On the other hand, the small houses financed a large proportion 
of natural gas, toll, and water bonds. The large houses completely 
dominated the flotation of railroad bonds. Over four-fifths were spon- 
sored by the major houses. On the other hand, the small houses gained 
more than their share of industrial financing, particularly service and 
trade companies. The small non-affiliates accounted for almost three- 
fifths of the real estate and finance company bonds. The minor private 
banks, while financing only a small fraction of the total amount of 
bonds, accounted for an eighth of all the investment trust bonds. 

As a result of these different distributions, according to the class of 
issuer, there were variations in the composition of the total bonds 
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floated by the several types of investment banks (Table III). While 
electric light and railroad bonds naturally constituted the bulk for each 
type of bank, the percentage ranged widely. Almost three-fifths of the 
bonds floated by the large private banks were general railroad obliga- 
tions. Of the bonds sold by the large non-affiliates, the electric light and 
general railroad bonds constituted over two-thirds. These bonds com- 
posed about half of the issues floated by the small non-affiliates and by 
the large affiliates, and less than half of the bonds financed by the small 
affiliates. Almost half of the bonds of the small private banks were 
electric light issues. 























TABLE IV 
INVESTMENT RESULTS OF BONDS ACCORDING TO INVESTMENT BANKS 
(Per cent) 
er Affiliate Petvate Total 
affiliate bank 
Tests of 
investment | } ies, | Pre 
results . | Affili- 
Large | Small Large | Small | Large | Small | Large | Small | affili- vate 
| | | ate ate | bank 
Coverage* 1.63 1.50 | 1.68 | 1.41 | 1.75 | 1.62 | 1.66 | 1.49 | 1.58 | 1.58 | 1.73 
Price stability 57 62 59 59 49 57 56 61 59 59 49 
Default 
Number 21 40 27 35 10 29 20 39 30 30 11 
Amount 17 34 21 25 9 8 16 32 21 22 9 
Net 
returnt 5.40 | 5.75 | 5.42 | 5.71 | 4.89 | 5.20 | 5.32 | 5.73 | 5.58 | 5.54 | 4.94 
Median 
yieldt 5.15 | 8.83 | 5.24 | 7.48 | 4.17 | 5.87 






































* Times fixed charges earned. 
t Date of issue. 
t December, 1936. 


Each type of bank specialized in a minor field. The large non-affiliates 
were interested in financing utility holding companies. Manufacturing 
bonds represented fourteen per cent of the issues of the large affiliates, 
while real estate bonds constituted 13 per cent of the small non- 
affiliate issues and 12 per cent of the small affiliate securities. Telephone 
bonds accounted for 13 per cent of the offerings of the large private 
banks, while the small private banks had a conspicuous proportion of 
investment trust issues. 


INVESTMENT RESULTS 


The second part of this article compares the market action of the 
bonds floated by the various types of houses (Table IV). This market 
action is judged by studying the fixed charge coverage, price stability, 
default, net return and median yield. Coverage was the number of 
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times that net earnings were available to meet the fixed charges on the 
bond of a corporation. In this study a ten-year average was taken, and 
in the comparatively small number of cases where the earnings record 
was incomplete, as long a period as possible was used. Price stability 
was computed by taking the variation between the highest and the 
lowest price of each bond for the period from 1930 through 1936, and 
dividing the difference by the highest price. Default was interpreted as 
failure to pay fully and unconditionally the interest or principal of the 
bond. These tests were applied to each bond separately and so the 
results refer to the number of the bonds of each type of bank and not 
to amount, except in the case of default where both number and 
amount were used. The net return was computed from the issue price, 
maturity and coupon of each bond. This net return was used as a basis 
for judging the quality of the bond at the time of its offering to the 
investing public. In general a high net return on a bond indicated poor 
grade, while a low net return reflected good quality. The median yield 
was the yield to maturity of the middle group of bonds when ranged 
according to numbers and in the study the end of December, 1936, 
was taken as the basic date. 

There was practically no difference in the investment results of the 
bonds floated by affiliate and non-affiliate banks. In fact there was a 
remarkable uniformity in the market experience of these two groups. 
They both showed the same coverage of 1.50 times, the same price 
range of 59 per cent and almost the identical percentage of default of 
30 per cent for the total number and 21 and 22 per cent for the total 
amount. The bonds of the private banks were far superior in every 
respect. The average coverage was higher, they were more stable in 
price and the percentage of default was far smaller at only 11 per cent 
for the total number and 9 per cent for the total amount. 

There was a significant difference between the bonds of the large and 
of the small houses. The bonds of the large banks were much better 
than the issues of the small banks. The coverage on the former was 
higher, their price was somewhat more stable and the percentage of 
default was much lower. The superiority of the bonds of the large 
houses is confirmed when they are studied separately according to each 
type of bank, for in almost every case the bonds of the large banks had 
better coverage, steadier price range and smaller percentage of default. 

The above conclusions on coverage are confirmed by a study of the 
separate classes of bonds (Table V). This analysis is necessary, since 
coverage varies with the different classes of bonds. There was no pro- 
nounced difference in the coverage of the various classes of bonds 
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floated by the non-affiliate and the affiliate bonds. The traction, utility, 
holding and real estate bonds of the non-affiliates were somewhat 
better protected, while the electric light, railroad and manufacturing 
bonds of the affiliates had slightly better coverage. In every case the 
bonds sponsored by the private banks had decidedly stronger coverage. 
This was true of the electric light, telephone and manufacturing bonds, 
while only their traction bonds were weaker. In the greater number of 
cases for each class of bond, the large houses had better protection than 
the small concerns. Specifically 11 of the groups of bonds of the large 
houses had better coverage as against only 4 of the smaller houses. 


TABLE V 


COVERAGE OF CLASSES OF BONDS FLOATED BY INVESTMENT BANKS 
(Time Fixed Charges Earned) 




















Non-affiliate Affiliate | Private bank 

Large Small Large | Small | Large Small 
Electric light 2.03 | 1.96 2.26 2.09 2.73 1.85 
Telephone 2.47 1.95 * 1.84 2.76 * 
Traction 1.12 1.37 1.01 0.82 1.12 1.32 
Utility holding 1.68 1.43 1.62 1.32 * * 
Railroad-general 1.41 1.29 1.84 1.46 1.56 * 
Manufacturing 1.75 1.43 1.81 2.01 2.24 * 
Real estate 1.31 1.05 0.85 1.09 * * 














* Insufficient number of bonds to evaluate computation. 


A study of the net return and the median yield shows certain in- 
teresting features. There was practically no difference at the time of 
issue in the quality of the bonds floated by the non-affiliates and the 
affiliates. The net return on both these groups showed very little differ- 
ence. There was a spread of only two points between the net return on 
the large non-affiliate bonds and that on the large affiliate bonds, only 
four points between the small non-affiliates and the small affiliates, and 
but four points between the total affiliates anu ihe total non-affiliates. 
The quality of the bonds of the private bans was much higher than 
of those of both the non-affiliates and the affiliates. The bonds of the 
private banks sold on a yield of 60 points lower than those of the 
affiliates and 64 points below those of the non-affiliates. 

The bonds of the large banks were of better grade than those of the 
small banks. This condition was true of all three types of banks. The 
large non-affiliate bonds were 35 points below the issues of the small 
non-affiliates; the bonds of the large affiliates were 29 points lower, and 
the large private banks, 31 points lower. The total bonds of the large 
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banks were 41 points below those of the total for the small banks. The 
bonds of the large houses brought more favorable results to the invest- 
ing public than those of the small houses. On the one hand, the issues 
of the large non-affiliates, the large affiliates and the major private 
banks, all showed net returns above the median yields, while, on the 
other hand, the bonds of the small non-affiliates, small affiliates and 
minor private banks all evidenced net returns below the median yields. 
In the case of the small non-affiliates and small affiliates, the return at 
date of issue was lower than the median yield indicating heavy losses 
to the investing public. 

In conclusion this study shows that there is no statistical foundation 
for the belief that specialized banking was superior to integrated bank- 
ing. There was practically no difference in the investment results of the 
bonds of non-affiliate and affiliate banks. In fact, the bonds of the 
private banks performing both commercial and investment operations 
were superior to those of the non-affiliate and affiliate banks. It would 
seem therefore that the Banking Act of 1933 separating commercial 
from investment banking did not rest on factual foundation. The belief 
in the utter financial depravity of integrated as compared with special- 
ized investment banking is a myth and not a legend. A legend at least 
has an element of historic truth. 


5 This trend may be seen by studying the net return in relation to the median yield. This relation- 
ship shows in a general way whether the bonds in a group have appreciated or depreciated in value from 
their date of issue. A net return above the median yield indicates that the group of bonds have appreci- 
ated in value and that the investing public has gained from this group. Conversely a net return belov- 
the median yield means that the group has depreciated in value and that the investing public has lost 
on the group since the date of offering. 











THE APPLICATION OF THE THEORY OF LINEAR 
HYPOTHESES TO THE COEFFICIENT OF 
ELASTICITY OF DEMAND 


By M. A. GrrsHIcKk 
U. S. Department of Agriculture 


PPROXIMATIONS to the standard error of the coefficient of elasticity 
A of demand were given in this JourNAL by Henry Schultz’ and 
Jacob L. Mosak.? In a more recent issue of this JourNAL, H. Gregg 
Lewis’ has shown that under certain conditions, the distribution of this 
coefficient can be approximated by the normal curve. 

The difficulty in applying the results obtained by the above authors 
to any given problem is that both the standard error formulae and the 
distribution involve population parameters whose values are seldom 
known. The substitution of sample estimates for these parameters is 
from a theoretical point of view a doubtful procedure, especially if the 
estimates are based on a small sample. 

Here it is proposed to deal with the coefficient of elasticity of demand 
from the point of view of the theory of linear hypotheses. The statisti- 
cal inferences drawn concerning this coefficient will not depend on any 
a priori knowledge of the values of population parameters. 

In the following discussion it will be assumed that the mean or ex- 
pected value of a normally distributed quantity variate X, is linearly 
related to X2, X3,---, Xz, where X2 is price and X3,---, X,% are 
other variables. More specifically, it will be assumed that for given 
values of X; to X;,, X1 is normally distributed with variance o? and 
mean 


Xi = Bi + BeXe+--- + BX, 


where i, §2, - - - , 6; are the population regression coefficients. 

Since for a demand function 0X;/0X:2 is negative, the partial co- 

efficient of elasticity of demand » will be defined as: 
aX xX 2 — BoX: 2 


7= -—s = 


0X2 Xi  hithhet+>-- +h 


Suppose that a sample of N. independent observations has been ob- 
tained on the quantity variate X, with corresponding observations on 
the variables X_ to X;. On the basis of the information supplied by this 
sample, the following statistical problems may be raised: 


1 “The Standard Error of the Coefficient of Elasticity of Demand,” March 1933, pp. 64-69. 

2 “The Least Squares Standard Error of the Coefficient of Elasticity of Demand,” June 1939, pp. 
353-361. 

3 “On the Distribution of the Partial Elasticity Coefficient,” September 1941, pp. 413-416. 
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Problem I: Estimate the elasticity of demand 1 for assigned values of 
Xe to Xx. 

Problem II: Test the hypothesis that »=0(mo being a definite num- 
ber) when X2 to X; have some assigned values. 

Problem III: For given values of Xe, X3, - - - , Xx find fiducial limits 
for 7. 

Problem IV: For given values of 7, X3, - - - , Xx, find fiducial limits 
for Xo. 

Estimating the elasticity of demand. The problem of estimating the 
coefficient of elasticity of demand is usually answered by substituting 
for the 6’s in formula (1) the corresponding least squares regression 
coefficients, and for X; to X; the specific values assigned to them. This 
method yields a maximum likelihood estimate of 7. This follows from 
the fact that a maximum likelihood estimate of a function of popula- 
tion parameters is equal to the same function with the parameters re- 
placed by their maximum likelihood estimates. 

Testing the hypothesis that n=. In order to answer problem II let 








it be assumed that the values assigned to X2, X3, ---, Xz are X2’, X3’, 
- ++, X;,’ respectively. The hypothesis to be tested is that 
— BoXo’ 
a , = 10; 
Bi + BoXe’ +--+ + BX: 
or 
Bi + BoZe + +++ + BrZ, = 0 (2) 
where 
1+7 
Zs = Xi'( *) and Z; = X;' (j = 3,---,k). 
No 


The hypothesis as stated in (2) is an example of a general class of 
statistical hypotheses known as “linear” hypotheses. The first compre- 
hensive discussion of such hypotheses was given in 1935 by the Polish 
statistician St Kolodziejezyk.* For a full discussion of the theory of 
testing linear hypotheses, the reader is referred to Kolodziejczyk’s 
paper and also to a paper by Palmer O. Johnson and J. Neyman.' For 
the purpose of this paper it will suffice to give the details for testing the 
hypothesis expressed by equation (2). 

Let Xi. stand for the ath observation of the ith variable 
(i=1, 2,---,k;a=1,2,---,N). Let X; be the sample mean of the 
ith variable and a;j=) *_\(Xie—X)(Xja—X,j). Furthermore, let 


4 “On an Important Class of Statistical Hypotheses,” Biometrika, Vol. X XVII, 1935. 

5 “Tests of Certain Linear Hypotheses and Their Application to Some Education Problems,” 
Statistical Research Memoirs, Vol. 1 (1936) Department of Statistics, University of London, University 
College. 
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bi, be, -- + , bg be the least squares estimates of 1, Bo, - - - , 8x respec- 
tively. In terms of these quantities it can be shown that the statistic 
appropriate for testing hypothesis (2) is given bv 


X1 + be(Z2 — X2) +--+ + di(Ze — Xx) 


s| a/v + >> > Cii(Z; - Xi)(Z; ais Xx) | 1/2 


t=2 j=2 


(3) 





Q= 


where s is the standard error of estimate |i.e. 


ies /— — beta — -- > — Day 
N-k 


and C;; is the cofactor of the element a;; in the matrix (a;;) divided by 
the determinant of the matrix (a;;), (¢, 7=2,3---, k)]. 

From the general theory of testing linear hypotheses, it follows that 
the quantity in (3) has a Student’s ¢ distribution with N —k degrees of 
freedom.® 

Therefore, to test the hypothesis that 7 =o, the following procedure 
is to be used. Calculate Q in (3). Find from published tables that value 
of ¢t which is exceeded in random samples say five times in a hundred. 
(That is, find the value of t for the .05 level of significance with n = N —k.) 
If the calculated value of Q (disregarding sign) exceeds the value of ¢ 
given in the table, the hypothesis that = is rejected. If Q does not 
exceed this value of t, the hypothesis is not rejected. 

Fiducial limits for n. The fiducial limits for the coefficient of elasticity 
of demand can be obtained from the following considerations: It will 
be noticed that the quantity Q in formula (3) has a Student’s ¢ distribu- 
tion no matter what n (contained in the definition of Z:) is. Consequently 
one can choose a level of significance say .05 and setting n= N —k, find 
that value of ¢ (which may be designated by ¢.o5) which corresponds to 
the level of significance chosen. Then setting Q? =t.o5? one can solve for 
n. Since the equation Q? =? 5? is of the second degree in 7, the solution 
will yield two values of n. These two values are the required (upper and 
lower) fiducial limits for 7. The interval defined by these two values is 
sometimes known as the “confidence” interval. 








® The truth of the above statement can also be seen from the following considerations. Let u=hi 
+b:Z:+--++ +b,Zx. Then if the hypothesis is true, the expected value of u is zero. Now a straightfor- 
ward calculation will show that o,?, the variance of u, is precisely the square of the quantity given in the 
denominator of (3) with o? substituted for s*. Moreover, since u is a linear function of the least squares 
regression coefficients, it is distributed normally and independently of s* (since each b is distributed 
normally and independently of s?). The quantity u?/c,? has a chi-square distribution with 1 degree of 
freedom and (N —k)s*?/o? has a chi-square distribution with N —k degrees of freedom. Hence, the square 
root of the ratio of u*/o,? to s?/? (which is the expression given in (3)) has Student's ¢ distribution with 
N —k degrees of freedom. 
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The “confidence” interval thus obtained for the coefficient of elas- 
ticity of demand has the following meaning. If whenever the statistician 
is sampling a variate X, (which has the distribution defined on page 233) 
he calculates the fiducial limits for 7 and makes the statement that the 
true value of 7 lies in the interval thus defined, he will in the long run 
be right in 95 per cent of the cases. Or to put it in another way, if the 
statistician obtains a great many samples of a variate having the dis- 
tribution attributed to X, and if he calculates the above intervals for 
each of the samples, then he may expect that 95 per cent of these in- 
tervals will cover the true value of 7. 

Fiducial limits for Xz. The statistician may in some cases be inter- 
ested in estimating that price (X2) which, for specified values of the 
remaining variables, corresponds to a given coefficient of elasticity of 
demand. This estimate may be obtained by substituting the b’s for the 
corresponding §’s in equation (1) and solving for X2. The fiducial limits 
for X2 for given values of n, X3, - - - , Xx, can be obtained in the same 
manner as outlined for obtaining the fiducial limits for 7. That is, one 
sets Q?=1.5;? and solves for the two values of X2 which this equation 
yields. 

In dealing with the coefficient of elasticity of demand, the economist 
is often concerned with its value only at the point of averages of all or a 
subset of the independent variables. For this purpose formulae for the 
fiducial limits of the coefficient of elasticity and X_2 are derived below 
in an explicit form. 

In equation (2) set X;=X,(j=2, 3,---, k). Then Q takes on the 
simple expression 








X1 + beX2/n 
Q = ——— ° (4) 
s[1/N + Co2X2?/n? |"? 
Equating Q? to tos? and solving for » yields 
beX 1X2 + Xow be? Xi? + (t.052s?/N — X12)(be? — Cost.os?s?) 5) 
= P 





t.o52s?/N — x7? 


The above two values of 7 give the required fiducial limits. 
If Z;=X; for j7=3, 4,---, k and X,.=X,.’, then Q reduces to 


a 1+ 7 
¥,+ b| x'(——*) %, | 
U] 


Q= ; (6) 


[wre{a@)-x}T 
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Equating Q? to ¢.o;? and solving for 7 the result is 

Xz’ [(be? — t.057s*C 22) (X2’ — X2) +b2X1] 
T (bos?s*/N — Xi?) — [(ba® — Cant 058%) (Xa! — X2)*-+ 2beXi( Xa’ — Xe) ] 





(7) 





* + Xo’ v/b2?X 12+ (t.052s?/ N — X1?) (b2? — Coat o5?s*) 
i (t.o52s?/N — X;°) = [ (b.?— Coat .o5°s*) (X,’ = X2)?+2b2X1 (X,’ = X2) | 
The above two values of 7 give the required fiducial limits. 
If = and Z;=X;for j=3, 4, - - - , k, then the fiducial limits for X- 
are given by 





_ No [ (be? — t.0578°C 22) X2—b2X1| 
(1 +70) (b2? — t.o5?s?C 22) 





X2 


(8) 





noV b2?X 12+ (t.05282/N — X11") (be? — Coot 0528”) 
~ (1+-10) (b2?—t.052s*C'22) 


This is obtained by equating Q? in (6) to é.o5? and solving for X». 

In case k= 2, the quantity C2 in the formulae (4) to (8) has the value 
1/a22 where, as defined above, a2=) *_,(X2a—X2)”. 

Frequently, the variable X», instead of being the price of the quantity 
demanded, is a known function of the price. Thus, for example, X2: may 
be given as 1/Y or \/Y where Y is the price of X;. If it is assumed then 
that X.=/f(Y), the coefficient of elasticity of demand is given by 

aX, Y dX, Y 


7;s-= —-— irre 


OY & "ar x 








where f: is the population regression coefficient as defined on page 233. 
In order to apply the results developed above to this situation it 
is only necessary to substitute in formula (3) for Z: the quantity 





dX, Y’ 

(x14 ) where X,’=f(Y’) (Y’ being a specified value of Y) 
No 

and dX:/dY is evaluated at the point Y=Y’. The hypothesis that 

n= no can then be tested and fiducial limits for 7 can be obtained in the 

same manner as outlined previously. 








BUSINESS USES OF DATA BY CENSUS 
TRACTS AND BLOCKS* 


By Verait D. ReEeEp, Assistant Director 
Bureau of the Census 


HERE HAS been entirely too much generalizing concerning the na- 

tional market—and about the smaller markets of which it is com- 
posed. This may have been justified in the past through lack of de- 
pendable and comparable facts for these component parts of the total 
market but that is no longer a rational excuse. Today we can even take 
cities apart into their social and economic segments and see what makes 
them “tick.” From the examination of a national market through a 
telescope we can turn to dissection and then examine the sections un- 
der the microscope. We begin to know something more than the gen- 
eralities which have proven so grossly misleading when applied to spe- 
cific cities or areas within them. 

Sixty of our largest cities have been divided into census tracts. For 
191 cities, including the tracted ones, housing data will even be pub- 
lished by blocks. This is in addition to all the facts published by cities, 
metropolitan districts, and minor civil divisions such as townships, 
parishes, beats, and towns. 

Many data on population and housing are being published by tracts. 
Even the major items for retail trade can be made available by com- 
binations of tracts. As examples of what can be done in these intra-city 
analyses of retail trade attention is called to four such studies made in 
the Census Bureau in connection with the Census of Business, 1935: 


1. Geographic Distribution of Retail Trade in Chicago, Illinois, 

2. Intra-City Business Census Statistics for Philadelphia, Penn- 
sylvania, 

3. Geographic Distribution of Retail Trade in Buffalo, New York, 

4. Changes in Retail Trade in Buffalo, 1929, 1933, and 1935. 


In the series of bulletins entitled “Statistics for Census Tracts,” the 
following standard tables for housing and population will appear: 


1. Population by race and nativity, and occupied dwelling units, 
by census tracts: 1940, 
2. Age, race, and sex, by census tracts: 1940, 


* A paper presented at the 103rd Annual Meeting of the American Statistical Association, New 
York, December 29, 1941. 
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3. Years of school completed, employment status, class of worker, 
major occupation group, country of birth, and citizenship, by 
sex, by census tracts: 1940, 

3a. Nonwhite population by years of school completed, employ- 
ment status, class of worker, major occupation group, and 
sex, by census tracts: 1940, 

4. Dwelling units by occupancy status and race of occupants, by 
census tracts: 1940, 

5. Value of owner-occupied dwelling units, contract or estimated 
rent of all dwelling units, and gross rent of tenant-occupied 
units, by census tracts: 1940, 

6. Type of structure, state of repair and plumbing equipment, size 
of household, persons per room, radio, refrigeration and heat- 
ing equipment, by census tracts: 1940, 

6a. Dwelling units occupied by nonwhite households, by value or 
rent, size of household, persons per room, etc., by census 
tracts: 1940. 


A table has been published in the First Series Population Bulletins 
showing the 1940 population of each tract (1930 population figures are 
shown where available). A map of the city by census tracts is included. 

Additional information on the characteristics of housing by census 
tracts is being presented in a Supplement to the First Series Housing 
Bulletins. 

For each of the 191 cities which had a population of 50,000 or over in 
1930 a supplement to the First Series Housing Bulletins will be issued 
containing three tables: 


1. Characteristics of Housing for the City: 1940, 
2. Characteristics of Housing by Census Tracts: 1940, 
3. Characteristics of Housing for Census Tracts by Blocks: 1940. 


For additional guidance in the application of census data to the prob- 
lems of a community it will be well to secure a copy of “Key to the 
Published and Tabulated Data for Small Areas.” Many useful facts are 
tabulated which the Bureau does not feel justified in publishing be- 
cause they are of interest only to a few groups or individuals. These 
tabulated data are available for the nominal charge of copying or re- 
producing. Other facts on housing, population, and retail trade are 
punched on cards but not tabulated. Special tabulations can be made 
at the expense of those who want them for tracts or combinations of 
tracts, wards, blocks, and enumeration districts, except that in the case 
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of retail trade it is not considered practicable to go beyond combina- 
tions of census tracts due largely to danger of disclosing individual 
operations. Far too few businessmen know the profit possibilities of 
this unpublished and untabulated information, available at a negligible 
fraction of the cost of collection. Enquiries regarding unpublished data 
are welcomed. 

One of the greatest values of census tract and block data is that they 
can be used as a background to which facts collected by others can be 
related. The businessman can, in seeking a solution for his local prob- 
lems, even collect additional information by these same areas to supple- 
ment the basic information furnished by the Census. 

To the manufacturer, his advertising agency, and his marketing con- 
sultants the potential dividends from this small area material are par- 
ticularly great. The detailed study of individual cities can be equally 
valuable as market analysis and as market synthesis. From a detailed 
knowledge of the treated cities he can construct accurately the charac- 
teristics of his total market. By conducting trial campaigns in carefully 
chosen trial markets the laboratory method is effectively applied to 
marketing. These trial markets no longer need be chosen by rule of 
thumb only to prove nonrepresentative and disappointing. They can 
be chosen on the basis of detailed analysis and the conditions may be 
controlled as desired. 

Most market surveys must be on a sample basis—and on a “thin” 
sample at that. In such sampling tract and block data assure us of 
adequate controls, proper stratification, representativeness of the uni- 
verse being sampled, and either consistency or the known causes of 
inconsistency. These are guides long overdue. 

Where do families live? How large-are they in various parts of the 
city? What kind of homes do they live in? Do they own or rent them? 
Where are the declining areas, the growing areas? What will be the 
direction of development? Where do the various racial and language 
groups live? Through what media, type of appeal, and type of salesman 
can they be reached? What is the purchasing power in the various 
areas? What parts of the city are worth intensive sales effort—and 
which can best be left to competitors? Which retail outlets should carry 
your high quality and your low quality lines? Which should carry the 
entire line and which should carry only certain items? How much in 
quantity and quality can you reasonably expect to sell in a given city— 
or parts of it? 

Knowing the answers to these questions—and they are there for most 
products—improves your salesmen’s percentage of sales to calls and 
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increases the size of sales. Sales quotas are reasonable expectancies of 
sales based on facts. They are given in detail for large city markets. They 
can be made the basis for real selective selling. Certain products will— 
or will not—-sell best in homes having mechanical refrigerators, electric 
lights, bathtubs, oil heat or radios. It is easy to find where such homes 
are concentrated and how many there are of them. Knowing keeps 
sales costs down. 

Some goods have highly specialized markets. Others have universal 
coverage. Types of clothing worn are related to age, sex, and race com- 
position of the population. Reading matter is directly related to educa- 
tional level. With the new census questions on education, income, and 
work status, and more complete tabulations on age, sex, color composi- 
tion, marital status, labor force data, and the entirely new housing 
census, marketing research has come of age. 

Banks and real estate interests have long been users of census tract 
data and they will undoubtedly find the block tabulations equally 
valuable. The housing census particularly affords them a wealth of 
information which they have never had before and it breaks these data 
down into the small packages necessary for most effective use. 

Types of dwellings and the relative proportion of different types are 
important indicators of loan values. Single-family houses surrounded 
by multi-family homes have a loan value different from a single-family 
house in a single-family neighborhood. Value of dwellings is partly de- 
termined by location in relation to stores, factories, gasoline stations, 
and other special types of properties. A decrease in families or popula- 
tion count in an area indicates a declining trend in property valuation. 
The existence or non-existence of adequate heating, cooking, lighting, 
and bath room facilities indicates the commercial possibilities of prop- 
erties. 

As a guide to intelligent appraisal of land values and building im- 
provements, the population and housing statistics by small areas should 
be a godsend. Mortgage conditions will be shown by tracts including 
the number of homes mortgaged, outstanding indebtedness, holder of 
mortgage, and total interest rate. 

One banker states that by capitalizing the rental values in any area 
he gets a very definite value for real estate in that area. He also says 
that he uses census facts for arriving at banking policies, for the banker 
must know where, as well as when, to expand or contract loans. 

Public utilities are constant users of intra-city area statistics. Upon 
them they must formulate their programs and plans as to the extension 
of telephone, electric light, gas, and transportation service. A telephone 
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executive states that his company must build a central office building 
and fill it full of equipment with an investment of from two to four 
million dollars. That building must have a satisfactory location. The 
population and housing characteristics of a city determine the location. 
These concerns must know where vacancies are high, where they are 
low and in what types of dwellings. They must know how many families 
live in one-family dwellings, how many live in apartments, and how 
many live in other types of multi-family units. Some of the public 
utilities have done particularly fine jobs. Others are expecting to do 
similar jobs in making surveys of their areas based on census tracts. 
One example of these is the Brooklyn Market Survey, with its many 
tract maps, made by the Brooklyn Edison Company in 1936. With the 
new facts available for 1940, both the number and the usefulness of 
these surveys will undoubtedly increase. 

Many of the telephone companies in the past have made special ar- 
rangements with the Bureau to secure tabulations by enumeration dis- 
tricts since it was found necessary to have data by areas smaller than 
census tracts in order to analyze specific telephone service areas. 

A telephone engineer writes, “The announced plans of the U. S. 
Census Bureau to publish their field data by blocks for the larger cities 
were received enthusiastically by telephone engineers since such in- 
formation will prove of tremendous value to this industry.” 

Wide-awake retailers long ago learned the value of proper location 
for their stores. Now any retailer desiring to determine the best location 
for different types of stores has available to him practically all the facts 
he needs from the census, except traffic, upon which to make his choice 
of a site. He can certainly know his potential customers better and find 
better ways to reach the types of consumer which he wishes to cover. 

The department store and the chain store, through tract and block 
data, can more intelligently locate warehouses and branches, plan their 
delivery services, determine effective advertising appeals, and choose 
the media best adapted to reaching their customers. 

Chambers of commerce have always had a deep interest in facts 
which reflected conditions and characteristics in various parts of their 
cities. This knowledge is fundamental to any understanding of real 
conditions and to improving them. From community betterment and 
health campaigns to campaigns for getting new factories, these intra- 
city breakdowns are serving these organizations. 

One chamber reports using them to help young physicians choose a 
location having two specific characteristics: First, it had to be a grow- 
ing neighborhood, and, second, it had to be a neighborhood offering 
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opportunities to become “company doctor” for one or more factories. 
This same organization has also used the tract data to help men choose 
locations for motion picture theatres, shoe stores, hardware stores, and 
an aviation plant. 

In selecting and soliciting business or manufacturing concerns most 
apt to benefit the community and most apt to succeed in the locality 
census facts are always more convincing than unsupported glibness. 
There is no better approach than presenting incontrovertible figures on 
the labor force, living conditions, purchasing power, and property 
values. 

An automobile club executive says, “It would take a volume to report 
to you all the uses which we make of your Census Tracts and Blocks. 
However, briefly, our entire Membership Department is based on cen- 
sus tract information. Our various territories for representatives to 
work are set up in accordance with census tracts and all prospects for 
membership are so classified. We use this information in evaluating our 
districts from an economic standpoint and it has been of great help in 
enabling us to make various research studies which we use in connec- 
tion with the solicitation of membership. We are now using it exten- 
sively in a study we are making of the results obtained from automobile 
driver education which is being carried on in the local High Schools.” 

The three outstanding news media—radio stations, magazines, and 
newspapers—are constantly coming to us to learn more about the ap- 
plications of our statistics to intra-city areas. 

Radio stations in reporting to the Federal Communications Commis- 
sion must sometimes furnish population or number of families within 
certain intensity bands. By breaking the tracts which fall on band 
limits into blocks, an excellent analysis can be made for this purpose. 
Those in Cleveland are certainly aware of these possibilities and have 
long used them through applying the results of their Real Property 
Inventory. Census facts can be used in any of the tracted cities for this 
purpose. 

Magazines and newspapers in any analysis of their readers, in setting 
up circulation plans, and in assuring maximum returns from the adver- 
tising carried by them will find the tract and block data indispensable. 

If there is any slogan which the newspaper publisher should take to 
his heart it is “Know thy city well to serve it best.” Editorials, news arti- 
cles, and promotion plans are dependent upon factual knowledge of the 
city and its parts. 

The progressive newspaper publisher must be capable of feeding facts 
as well as life into his news, advertising, and circulation departments no 
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less than into his editorials. Every publisher is interested in one certain 
community (or group of such communities)—a city and its hinterland. 
The city is the yolk and the hinterland is the white of the egg from 
which newspaper profits hatch. That egg must, however, be a sound one 
fertilized with creative imagination and progressive management lest 
it rot rather than hatch. Trial and error methods mean low hatching 
scores even with the best of incubators. The census provides the means 
for grading and candling that egg and determining in advance the con- 
ditions which will assure the hatching and maturing of the chick. 

A newspaper which has large-scale home deliveries in a tracted city 
can allocate its subscribers by these tracts, then study intensively their 
social and economic characteristics—their economic status, education, 
color-nativity, age, sex, occupational and industrial composition, and 
the nature of their homes and the equipment in them. 

Within the bounds of this paper it has been impossible to treat in de- 
tail the methods of applying tract and block data to the solution of 
many specific business problems. Certainly the applications covered 
can be adopted or adapted with profit by any businessman who wishes 
to do a better job of evaluating the opportunities and problems existing 
in larger cities. Putting to work the basic facts which are made so easily 
available to him in packages conforming to his practical needs is now 
only a matter of choice. However, even census facts possess value only 
through use. 











GENERAL PRINCIPLES OF TRACT DELIMITATION* 


By C. E. BatscHEe.et, Geographer 
Bureau of the Census 


OW THAT THE ACTUAL enumeration of the Census of 1940 is com- 

pleted and the results are being published, it seems a good time to 
consider the entire tract program and to lay down certain general prin- 
ciples which our experience has shown us should be followed in de- 
limiting these areas. 

Prior to the last census, the cost of all tract tabulations was defrayed 
by the tract sponsoring groups, and the Bureau of the Census merely 
constituted a “Board of Review” to determine whether the tracts were 
established in conformity with certain fundamental principles. How- 
ever, as the Bureau now tabulates and publishes the tract statistics, 
our part in the program should no longer be merely a review of the 
tract layout, but we should take an active part in the delimiting of the 
areas to the end that uniformity may be achieved—not uniformity in 
the size or composition of the tracts, but uniformity in the numbering, 
boundary descriptions, and problems of a similar nature. For example, 
at the present time, there are probably seven or eight different number- 
ing systems in use, which is not good practice. 

In the light of these facts, the Bureau of the Census has outlined the 
following course of action which each city interested in a tract setup 
will be asked to follow: 


(1) The submission to the Bureau of the Census of a cadastral 
map, preferably on a scale of 1,000 feet to the inch, showing 
the proposed tracts, as well as the blocks, street layout, and 
street names. This map should show the existing streets as dis- 
tinct from proposed street plans. 

(2) The submission to the Bureau of the Census of a list of the 
agencies, public and private, approving the tracts, as well as 
the names of any agencies disapproving them. 

(3) After the maps are reviewed by the Bureau of the Census, they 
will be returned to the tract sponsor, and the necessary changes 
must be made to adjust the tract lines to the mutual satisfac- 
tion of the participating agencies and this Bureau. 

(4) The submission to the Bureau of the Census of a statement 
from the Mayor, or other head of the city government, offi- 
cially approving the tract plan. 

(5) The publication of a tract outline map which can be used by 


* A paper presented at the 103rd Annual Meeting of the American Statistical Association. New 
York, December 29, 1941. 
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local agencies for plotting statistical data, and the submission 
of a copy of this map to the Bureau of the Census. 

(6) The publication of a tract street index and the submission of 
two copies of this index to the Bureau of the Census. 


As soon as all of the above conditions have been met, the tract spon- 
sor will be notified that the Bureau of the Census recognizes that the 
city is tracted. 

At the Census of 1940 there were 61 tracted cities of which 55 were 
municipalities of 100,000 or more population. This leaves 38 cities of 
the 100,000 class still untracted. It should be the goal to have tracts 
established in all of these cities in time for the 1950 Census. As a guide 
in the delimiting of tract areas, the Bureau of the Census is glad to send 
to any city considering the establishing of tracts a copy of a map show- 
ing the enumeration districts used in the city at the Census of 1940 
with the population reported for these enumeration districts. 

Further there were 24 cities which had discovered that tracts are 
just as essential in the territory surrounding the city as in the central 
city itself and had established tract units in the extra-municipal area. 
The Bureau of the Census feels that tracts should be established in the 
metropolitan areas of all of the 100,000 cities, and has accordingly 
drawn up the following regulations governing the delimiting of tracts: 


(1) Tracts in the areas surrounding cities should be limited to the 
metropolitan district as established by the Bureau of the Cen- 
sus or to the county in which the central city is located. 

(2) The tracts should conform to the boundaries of the county 
political subdivisions, i.e., each minor civil division should 
form a complete tract or be subdivided into two or more tracts. 
It is realized that for some cities this policy would occasion 
changes in the tract boundaries from decade to decade because 
of changes in the minor civil division lines. However, this now 
occurs in the periphery of a city, where the tract boundaries 
change on account of annexations to or detachments from the 
central city. 

(3) The delimiting of tracts in the metropolitan district should be 
under the direction of the tract sponsors in the central city 
working in close collaboration with county and city officials. 

(4) The tract sponsor should submit to the Bureau of the Census 
a cadastral map showing the tracts by the subdivisions speci- 
fied in paragraph 2. For use in this work, the Bureau of the 
Census will be glad to supply, upon request, copies of the 
county base maps on file. 














A METHOD OF ANALYZING THE ELEMENTS 
OF FORECLOSURE RISK* 


By Mortimer KAPLAN 
Federal Housing Administration 


HE FUNCTION of practical research in mortgage finance is to provide 
yi basis for lending policy which will circumscribe the three prin- 
cipal problems of risk bearing, namely, the incidence and magnitude of 
foreclosure expectancy, the incidence and magnitude of probable losses, 
and the standards of risk selection. The need for research and its scope 
was recognized by the officers of the Federal Housing Administration 
simultaneously with the inception of mortgage insurance operations 
and the program was provided for in its division of Research and Sta- 
tistics. The justification and value of such studies to private lenders as 
well as to the Federal Housing Administration should be obvious in 
view of the widespread interest in FHA mortgages manifest in the 
growth of the insured mortgage portfolios of financial institutions. 

Foreclosure risk is the first of the two components of mortgage risk 
which must be studied to determine how risk factors are associated with 
bad mortgage loans. In establishing the limits of foreclosure risk for 
salient elements of the mortgage transaction, the maximum limits of 
loss risk, the second component of mortgage risk, are also delineated in 
terms of the risk elements. Loss risk, of course, will depend wholly on 
the financial experience in the disposition of the foreclosed mortgage 
security. 

The purpose of this paper is to present a method of analyzing the ele- 
ments of foreclosure risk and of measuring the variations in risk in 
terms of a foreclosure risk index and a foreclosure expectancy rate. This 
method was applied in the forthcoming monograph on the foreclosure 
experience with insured mortgages based on the first five years of opera- 
tion of the Mutual Mortgage Insurance program.! The monograph 
covers the first of the three problems of risk bearing in both its phases, 
namely, over-all foreclosure risk and the components of foreclosure 
risk.? In the interests of economy of space, only the major features of 
this technique will be treated. 


* A paper presented at the 103rd Annual Meeting of the American Statistical Association, New 
York, December 27, 1942. 

1 “Foreclosure Experience with Insured Mortgages: A Report on the First Five Years of Operation 
of the Mutual Mortgage Insurance Program,” Division of Research and Statistics, Federal Housing 
Administration, Washington, D. C. 

* For a complete statement of the scope of this study consult “An Analysis of Foreclosure Risk,” 
Insured Mortgage Portfolio, Federal Housing Administration, VI, 1, Third Quarter, 1941. p. 16. 
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In presenting the method of measuring the foreclosure propensity 
of the various elements of insured mortgage risk, a general statement 
describing the technique of analysis and its rationale will precede the 
more detailed description of the adjustments to this method which 
were necessary to utilize all of the foreclosed mortgages which had oc- 
curred on mortgages insured in these first five years. 

The implementation of the method of determining the variations in 
foreclosure risk associated with the separate risk characteristics re- 
flected in the mortgage transaction, such as the type of borrower, type 
of mortgage, and type of mortgage security, involves compiling samples 
of good and bad mortgages for the various risk elements which are 
selected for study, and the preparation of parallel frequency distribu- 
tions for convenient class intervals. 

These percentage distributions are then compared to reveal differ- 
ences by the use of a per cent relative which is derived by dividing the 
percentage of foreclosed cases in each of the class intervals by the per- 
centage of good mortgages in the same class interval. 

The resulting relative, which may be called a foreclosure risk index, 
measures the risk of foreclosure for each of the groups in terms of the 
average foreclosure experience of all groups. Thus the values of the 
foreclosure risk index for any one of the classes which are less than 100 
—the average foreclosure risk for mortgages in all groups—indicate 
better-than-average foreclosure risks and the groups with index values 
greater than 100 are worse-than-average foreclosure risks. 

The rationale of this method, briefly, is first to determine the pres- 
ence of differences from assumed uniformity in a dichotomous classifi- 
cation of mutually exclusive categories of good and bad mortgages; and, 
second, to measure these differences in terms of an average foreclosure 
experience with the mortgages under observation. The degree to which 
these differences are significant in terms of a reliable sample can be 
determined by the application of accepted statistical tests of signifi- 
cance. This is essentially the basis of the statistical technique employed 
in the above-mentioned study. 

Because of the limitations of the available data on good and bad 
insured mortgages, the method actually employed in the analysis of 
foreclosure risk is an approximation of the method described above. 
These limitations arise principally out of the fact that the typical in- 
sured mortgage is a long-term amortized loan, the first of which was 
made under the Mutual Mortgage Insurance Program in 1935, and 
that a relatively insignificant number of foreclosures has since occurred. 
This paucity of experience is responsible, in the first instance, for the 
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modification of the definitions of terms used and, in the second instance, 
for the necessity of the adjustment in the basic data. 

From a contractual point of view a good mortgage is one which is 
paid off according to the terms of the mortgage instrument. Such a defi- 
nition is obviously not convenient to use in an analysis of the kind of 
long-term amortized mortgages insured by FHA. It will be many years 
hence before a sufficiently adequate sample of this highly desirable 
classification of mortgages will be amenable to analysis. 

On the other hand, from a financial point of view, a good mortgage 
might include one on which the gross income and proceeds realized on 
sale in case of foreclosure are sufficient to cover all losses including ex- 
penses. However, since risk of foreclosure and not risk of loss is the ob- 
jective of analysis, this profit and loss definition is ipso facto inade- 
quate. 

In lieu of these two currently inadequate definitions of a good mort- 
gage, a definition, which provides a reasonably good approximation of 
the conditions implicit in the first of the proposed definitions and which 
affords at the same time an adequate workable sample of cases, is to be 
found in the mortgages-in-force, that is, the mortgages in good stand- 
ing as of a particular date. The adequacy of this definition rests on the 
assumption that the bulk of the mortgages made will survive to matur- 
ity. 

This difficulty in formulating a useful definition of good mortgages 
is not encountered in the definition of bad mortgages. For purposes of 
an analysis of foreclosure risk, a bad mortgage can be taken as being 
synonymous with a foreclosed mortgage. Foreclosure is an articulate 
legal status and its legal definition is useful to the kind of empirical ap- 
proach undertaken here. 

Ideal basic statistics for a foreclosure analysis would be provided if 
all mortgages were made at approximately the same time, say within 
the same year, and if the mortgages-in-force and the foreclosures were 
derived from this original portfolio. The sampling problem would con- 
sist in selecting all or a representative sample of those mortgages which 
survived during this period and all or a representative sample of those 
mortgages which were foreclosed. Under such conditions all mortgages 
in this ideal portfolio would be exposed to the same general risk factors. 

The problem of selecting the samples of good and bad insured mort- 
gages was automatically determined by the nature of the universe of 
each of these categories. The surviving mortgages as well as the fore- 
closed ones come from mortgages made in each of the five years be- 
tween 1935 and 1939. With respect to the mortgages-in-force, the ho- 
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mogeneity of the mortgages insured was affected principally by the 
circumstances under which the Mutual Mortgage Insurance program 
got under way and by the subsequent amendments to the National 
Housing Act. 

At the inception of insuring operations, the applications for FHA- 
insured mortgages came principally from home owners interested in 
refinancing their mortgages under more favorable terms. A much 
smaller, almost negligible, proportion came from purchasers of new 
homes. This was to be expected since home building activity had not 
yet reacted to the stimulus of federal recovery legislation. As time 
went on, this situation was reversed and purchasers of new houses, i.e., 
homes less than a year old at the time of commitment of insurance, 
represented a preponderant proportion of mortgagors under the Mu- 
tual Mortgage Insurance program. In Table I these changes can be seen 
in bold relief. 


TABLE I 


PERCENTAGE DISTRIBUTION OF MORTGAGES INSURED ON TOTAL, NEW, AND 
EXISTING 1- TO 4-FAMILY HOMES, 1935-1939 











Year insured 1935 1936 1937 1938 1939 

All homes 100.00 100.00 100.00 100.00 100.00 
New homes 28.13 35.84 49.84 60.05 71.08 
Existing homes 71.87 64.16 50.16 39.95 28.92 





Up to the end of 1937 the gradual changes in the character of the 
mortgage security can probably be explained by the general recovery in 
economic conditions and particularly in building construction. The 
sharp reversal in the type of mortgage security after 1937 can be ex- 
plained in large part by the amendments of February 3, 1938, to the 
National Housing Act, which gave special consideration to the mort- 
gage financing of new homes by extending the maximum term to 25 
years and by raising the maximum loan-value ratio to 90 per cent. Not 
only did the proportions as between new and existing homes change, as 
indicated in the previous table, but the volume of mortgages increased 
markedly during this period. Moreover, significant changes were re- 
flected in the average characteristics of the mortgages made in each of 
the years of this period, such as the lower average valuations, the longer 
average maturities, and higher loan-value ratios. Table II presents the 
proportion of mortgages insured in each of the five years, as well as the 
distribution of foreclosures for the same period. 

These factors not only affected the homogeneity of the mortgages- 
in-force but also affected the distributions of the foreclosures out of 
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which they had to come. The character of the distributions of the so- 
called good mortgages, or mortgages-in-force, would be largely deter- 
mined by the mortgages made in the later years, while the character of 
the foreclosures would be largely determined by those mortgages made 
in the earlier years. 


TABLE II 


PERCENTAGE DISTRIBUTION OF MORTGAGES INSURED AND FORECLOSED ON 
TOTAL, NEW, AND EXISTING HOMES SECURING MORTGAGES INSURED, 
1935-1939, BY YEAR INSURED 
Based on mortgages foreclosed through June 30, 1940 











All homes New homes Existing homes 
Year insured Insured Foreclosed Insured Foreclosed Insured Foreclosed 
1935 5.03 12.17 2.53 5.09 8.18 17.96 
1936 16.58 31.07 10.64 25.57 24.09 35.57 
1937 21.92 36.78 19.56 40.54 24.89 33.70 
1938 23.46 17.22 25.24 24.64 21.22 11.15 
1939 33.01 2.76 42.03 4.16 21.62 1.62 
Total 100.00 100.00 100.00 100.00 100.00 100.00 





In order to have an adequate sample, all foreclosures on mortgages 
insured during this five-year period, 1935-1939, had to be used. How- 
ever, there are differences from the assumed uniformity of the distribu- 
tions of good and bad mortgages which could be attributed to the fac- 
tors responsible for the heterogeneity of the mortgages and the varying 
exposure periods prior to foreclosure, and not necessarily to differences 
which could be attributed to foreclosure risk. This heterogeneity char- 
acteristic of the good mortgages can be corrected by introducing a 
parallel heterogeneity in the sample of foreclosures—a somewhat un- 
orthodox statistical concept whose meaning will become clear presently. 
This is achieved by adjusting the varying exposure periods prior to 
foreclosure to periods of comparable length. 

The adjustment developed to compensate for these two factors is 
based on the monthly foreclosure mortality experience table for in- 
sured mortgages. This foreclosure mortality table is constructed on the 
basis of the same principles as mortality tables on human life used by 
life insurance companies in the calculation of an appropriate premium 
rate. Just as a mortality table is designed to answer the question on the 
number of insured lives at various ages who are expected to die within 
a given period, so the mortality table for foreclosures presents the num- 
ber expected to be foreclosed of the number of mortgages insured for 
varying periods of time. 

Since the mortgages were insured in varying amounts in separate 
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years, the desideratum is to reproduce a situation in which all the mort- 
gages were insured at the same time, or roughly in the same year, and 
examine the character of the experience which could be expected at the 
end of the period. This can be achieved through the application of the 
foreclosure mortality tables. 

It should be clear that a greater number of foreclosures would have 
resulted if all mortgages were made in the first year of the five-year 


TABLE III 
RANGE AND AVERAGE PERIOD OF EXPOSURE PRIOR TO FORECLOSURE AND COR- 
RESPONDING FORECLOSURE RATES ON TOTAL, NEW, AND EXISTING HOMES 
INSURED FROM 1935 THROUGH 1939, AS OF JUNE 30, 1940 








Year Insured 


1935 1936 1937 1938 1939 
Range in Months Exposed Prior to Foreclosure 
All cases 1-66 mo. 1-54 mo. 1-42 mo. 1-30 mo. 1-18 mo. 
Average Expusure Period Prior to Foreclosure 
All homes 41.34 mo. 33.27 mo. 25.50 mo. 18.05 mo. 12.13 mo. 
New homes 41.24 31.01 24.30 16.84 12.10 
Existing homes 41.36 33.99 26.09 18.86 12.18 
Average exposure period prior to foreclosure for all homes ...............eeeeeeeeeeeees 26.11 mo. 
Average exposure period prior to foreclosure for new homes ..............02eeeeeeeeeeee 22.37 mo. 
Average exposure period prior to foreclosure for existing homes. ................+see00: 28.76 mo. 
Foreclosure Mortality Rate for Average Exposure Period 
All homes 1.225% .899% -556% . 263% .099 % 
New homes 1.139 -784 -561 -254 -128 
Existing homes 1.257 -923 .554 -272 -081 
Foreclosure rate for average exposure period for all homes................eceeeeeeeceees 544% 
Foreclosure rate for average exposure period for new homes...............e+eeeeeeeeeeee -411% 
Foreclosure rate for average exposure period for existing homes..............+-eeeeee008 -627% 





period. The longer the period of exposure to risk, the greater is the pro- 
portion of foreclosures out of a given number of mortgages made at the 
same time. The rationale of the adjustment method employed is as 
follows: On June 30, 1940, the mortgages which had been insured dur- 
ing the calendar year 1939 could have been foreclosed in any month 
from January 1939 through June 1940. The range of the period of ex- 
posure to foreclosure is thus from 1 to 18 months. The average period of 
exposure prior to foreclosure for the foreclosures on mortgages made in 
1939 is 12.13 months. The proportion of mortgages which could be 
expected to be foreclosed after an exposure period of 12.13 months is 
.099 of 1 per cent. This foreclosure rate is derived from the foreclosure 
mortality table. 

Table III presents the range in months, the average period of ex- 
posure, and the corresponding foreclosure rates for mortgages on total, 
new, and existing homes made in the separate years between 1935 and 
1939. 
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It is apparent that if the mortgages made in 1939 had been made in 
1935, there would have been many more foreclosures according to the 
foreclosure mortality table. Since the foreclosure rate for the 1939 
mortgages is, according to the preceding table, .099 of 1 per cent, and 
the rate for the 1935 mortgages is 1.225 per cent, then 12.374 times 
(1.225+.099) as many foreclosures could have been expected by June 
30, 1940, if the 1939 mortgages were made in 1935. The rate of fore- 


TABLE IV 
ACTUAL AND ADJUSTED FORECLOSURES ON MORTGAGES SECURED BY ALL, NEW, 
AND EXISTING 1- TO 4-FAMILY HOMES INSURED, 1935-1939 
Based on Mortgages Foreclosed Through June 30, 1940 











All New Existing 
Year insured homes* homes bomes 
1935 Actual foreclosures 260 49 211 
1936 Actual foreclosures 664 246 418 
Expectancy adjustment factor 1.395 1.453 1.362 
Adjusted foreclosures 926 357 569 
1937 Actual foreclosures 786 390 396 
Expectancy adjustment factor 2.151 2.030 2.269 
Adjusted foreclosures 1,691 792 899 
1938 Actual foreclosures 368 237 131 
Expectancy adjustment factor 4.533 4.484 4.621 
Adjusted foreclosures 1,668 1,063 605 
1939 Actual foreclosures 59 40 19 
Expectancy adjustment factor 11.034 8.898 15.519 
Adjusted foreclosures 651 356 295 
Total actual foreclosures 2,137 962 1,175 
Total adjusted foreclosures 5,196 2,617 2,579 





* The discrepancies between the values of the expectancy factors for all homes in this tableand 
those in the text are explained by the somewhat different definitions of new and existing homes used 
in the construction of the mortgage mortality table. The differences in rates are, however, insignificant. 
closure for the 1935 mortgages is thus 12.374 times the rate for the 1939 
mortgages. 

The value, 12.374 is called an “expectancy adjustment factor.” This 
is used to convert the actual foreclosures on mortgages made in 1939 
to what could be expected to prevail if these mortgages were made in 
the first year of the five-year period. 

Expectancy adjustment factors are computed for the foreclosures on 
the mortgages insured in each year between 1936 and 1939. For the 
foreclosures on mortgages made in 1935, they are unity because it is 
this year which is used for the base foreclosure experience. 
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Actually, of the mortgages made in 1939, only 59 mortgages on 1- to 
4-family homes were foreclosed by the end of June 1940. On the basis of 
the foreclosure mortality table and the derived expectancy adjustment 


TABLE V 


FORECLOSURE EXPERIENCE WITH PROPERTY VALUATION ON SINGLE-FAMILY 
HOMES SECURING MORTGAGES INSURED, 1935-1939 
Based on mortgages foreclosed thrcugh June 30, 1940 














Mortgages insured Foreclosure Foreclosure 
Property valuation 1935-1939 risk index, Chi-square test rate, adjusted 
Number Per cent adjusted of significance* 1935-1939 
Less than $2,000 5,340 1.20 95.00 not significant 1.07% 
$2,000 to 2,999 33 , 465 7.55 115.36 significant 1.31 
3,000 to 3,999 81,638 18.41 82.35 significant -93 
4,000 to 4,999 97 ,889 22.08 81.70 significant -93 
5,000 to 5,999 85 ,424 19.27 77.01 significant .87 
6,000 to 6,999 61,517 13.87 101.95 not significant 1.15 
7,000 to 7,999 30 , 457 6.87 110.33 not significant 1.25 
8,000 to 9,999 26 ,082 5.88 176.53 significant 2.00 
10,000 to 11,999 10,048 2.27 180.18 significant 2.04 
12,000 to 14,999 6,373 1.44 238.89 significant 2.71 
15,000 or more 5,144 1.16 213.79 significant 2.41 
Total 443 ,377 100.00 100.00 significant 1.13% 
TABLE VI 


FORECLOSURE EXPERIENCE WITH BORROWER’S ANNUAL INCOME ON SINGLE- 
FAMILY OWNER-OCCUPIED HOMES SECURING MORTGAGES INSURED, 1935-1939 
Based on mortgages foreclosed through June 30, 1940 











Mortgages insured Foreclosure Foreclosure 
Borrower's annual 1935-1939 risk index, Chi-square test rate, adjusted 
income Number Per cent adjusted of significance* 1935-1939 
Less than $1,000 1,717 .47 129.79 not significant 1.39% 
$1,000 to 1,499 19 ,532 5.40 102.96 not significant 1.11 
1,500 to 1,999 68 ,724 19.01 97°69 not significant 1.05 
2,000 to 2,499 90 ,366 25.01 89.44 significant .96 
2,500 to 2,999 54,075 14.96 82.15 significant .88 
3,000 to 3,499 41,840 11.58 99.31 not significant 1.07 
3,500 to 3,999 27 ,488 7.61 116.32 significant 1.25 
4,000 to 4,999 25,943 7.18 112.67 not very significant 1.21 
5,000 to 6,999 19,001 5.26 140.30 significant 1.51 
7,000 to 9,999 8,023 2.22 112.16 not significant 1.21 
10,000 or more 4,716 1.30 176.15 significant 1.89 
Total 361,425 109.00 100.00 significant 1.08% 





* Significant-computed values of x? are significant according to 1 per cent standard of probability 
Not very significant-computed values of x? are significant according to 5 per cent standard of 


probability. 

factors for mortgages exposed an average period of 12.13 months, 
12.374 times as many, or 730 foreclosures, could have been expected 
had these 1939 mortgages been made in 1935. 
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Table IV, presenting the expectancy adjustment factors and the 
actual and adjusted foreclosures, illustrates how the total number of 
adjusted foreclosures is arrived at for all homes as well as for new and 
existing homes. 

These expectancy adjustment factors derived from the mortgage 
mortality table were then applied as weighting factors to the numerical 
distributions of foreclosures on mortgages insured in each of the five 
years for each of the 16 risk characteristics selected for analysis. The 
adjusted foreclosures in each of the class intervals were summed to give 
the adjusted percentage distribution used in the calculation of the fore- 
closure relative or risk index. 

It must be pointed out that the values of the foreclosure risk index 

so computed for any one of the risk elements are valid for only a five- 
year period of the order which describes the economic conditions which 
prevailed between 1935 and 1939. 

In order to determine the reliability of the samples of cases used and 
the significance of the differences in the distributions of the mortgages- 
in-force and the foreclosures from assumed uniformity, the Chi-Square 
test of significance using the one per cent probability standard was ap- 
plied. 

Since the adjusted foreclosures are the number that could be ex- 
pected if all the mortgages were exposed between 1 and 66 months, a 
crude foreclosure expectancy rate may be computed. At the end of five 
years of operation, the adjusted foreclosure rate on all mortgages 
insured by the Federal Housing Administration with comparable ex- 
posure periods is 1.12 per cent. This foreclosure rate is a crude rate and 
should be distinguished from the rates provided by the mortality table 
which are survivor rates. 

Tables V and VI are taken from the FHA study of foreclosure ex- 
perience to illustrate the application of this method of analysis in the 
case of property valuation and borrower’s annual income. 








SAMPLING ERRORS OF SYSTEMATIC AND 
RANDOM SURVEYS OF COVER- 
TYPE AREAS* 


By James G. OsBoRNET 
Forest Service 


HE PURPOSE of the present paper is to report the results of an inves- 

tigation into the accuracy of sample estimates obtained from sys- 
tematic and from random samples. Also, in it will be described a 
procedure for estimating the sampling error of an estimate based on a 
set of systematic observations. 

The studies reported here deal with the results of sampling to esti- 
mate the composition of an area by cover-type classes, but these esti- 
mates differ in no essential way from estimates of composition by many 
other criteria of classification, e.g., land-use classes, forest-condition 
classes, etc. In fact, the decision to use material of this kind in the study 
was based largely on convenience and availability of data, and not on a 
belief that the type of material selected is peculiarly suited to this kind 
of study. It should be pointed out, however, that the problem selected 
is one of real importance, since estimates of areas according to pre- 
scribed classifications are basic to most land-use planning and form an 
integral part of many surveys. 

The data used in the studies reported here were obtained from two 
cover-type maps, one representing a moderate-sized area and the other 
a large area; the sampling being moderately intensive in the first and 
very extensive in the second. Specifically, the first is a vegetative-cover 
and land-use map of an area 28 X30 miles in southern California; the 
sampling of this map being at the rate of one line per mile of width. 
The second is a forest-type and condition map of a section of north- 
western Washington running southward 80 miles from the Canadian 
border, and from Puget Sound to the summit of the Cascades—a dis- 
tance of roughly 60 miles; the sampling here was at the rate of one line 
in 10 miles of width, thereby following the national forest-survey pro- 
cedure. In both cases, continuous lines were drawn across the maps and 
the length of line traversing each type was recorded. 

The area in a specified type is assumed to occupy the same per cent of 
the total area as the length of line through that type does of the total 

* A paper presented at the 103rd Annual Meeting of the American Statistical Association in joint 
session with the Institute of Mathematical Statistics, New York, December 30, 1941. 
t The author is especially indebted to Miss M. M. Sandomire, who supervised and did much of the 


computational work as well as offered a number of fruitful suggestions in the analysis, and to Dr. I. T. 
Haig, for continued encouragement and support of the large clerical job involved. 
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length of line. As a check of this assumption, the area of cultivated land 
on the map of the smaller area was measured with a planimeter and was 
compared with that estimated by the line interception method, the 
estimate being based upon all lines measured. By planimeter measure, 
41.748 per cent of the area was in cultivated land; by the line-intercep- 
tion method, the percentage was 41.896, a difference of 0.148 per cent 


CHART I 


A SYSTEMATIC SAMPLE OF 30 LINES ACROSS THE AREA IN CULTIVATED 
LAND, CONSISTING OF LINE 16 IN EACH OF THE 30 MILES OF WIDTH 
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for the total area, or 0.353 per cent of the area in cultivated land. This 
is well within the limits of accuracy of the planimeter used. 

The results of the studies of the two maps will be considered in turn. 
The data for the California study consisted of type recordings on 600 
lines across the area, selected in the following way: Each of the 30 miles 
of the width of this tract was divided into 32 equal parts representing 
32 lines across the area. From the numbers 1-32, 20 numbers were se- 
lected at random and the line corresponding to each of these 20 num- 
bers was run in each mile. For example, line 4 was run in miles 1, 2, 3, 
. . . 30; then line 6 was run in each mile, etc. The observations on the 
600 lines so selected constituted the basic data from which samples 
were drawn. 
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The 20 sets of 30 lines 1 mile apart constituted 20 systematic sam- 
ples. They are, however, random in their totals and therefore provide a 
valid estimate of the variance of such complete surveys. For compari- 
son, 20 random samples were drawn in each of several ways. These in- 
cluded among others (1) completely random (that is, sets of 30 lines 
selected completely at random out of the 600 lines) and (2) stratified 


CHART II 


A RANDOMIZED BLOCK SAMPLE OF 30 LINES ACROSS THE AREA, CONSISTING 
OF 2 LINES SELECTED AT RANDOM IN EACH OF 15 BLOCKS 2 MILES WIDE 
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random or randomized blocks. In (2), the area was divided into 15 
blocks, each 2 miles wide, and two lines were drawn at random in each 
block. Time permits reporting upon only one cover type, and for this 
purpose the area in cultivated land is selected. For the other types the 
implications are similar. 

Charts I and II show the area in cultivated land (cross-hatched), 
and two methods of sampling. 

To the nearest whole number, the standard deviation of the system- 
atic totals was 279 chains; that of the randomized block totals was 564; 
and that of the completely random totals was 1701. The means of the 
survey totals were, respectively, 28154, 28238, and 28657. It is thus seen 
that, for this type, the estimates by all three methods tend toward the 
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same value, but that the standard deviation of the systematic totals is 
only half as large as that of the randomized blocks, and only one-sixth 
as large as for completely random samples. 

The practical significance of the comparison of the accuracy of sys- 
tematic and randomized block estimates is impressive. Only one-fourth 
as much information is obtained from the stratified random as from the 
systematic sample. Why? At least a part of the answer is apparent 
from a glance at the distribution of the observations composing the 
first randomized block sample drawn (see Chart II). As would be ex- 
pected with blocks 2 miles wide, approximately one-fourth of the miles 
have no samples (actually 7 out of 30) while the same number have 2 
lines. Even more serious, as will generally be true, lines close together 
occur frequently, there being in this case 4 pairs of lines only } mile 
apart. With lines 28 miles long, a second line } mile from the first 
gives little additional information after the first has been measured. To 
offset this weakness, an unbiased estimate of the sampling error of a 
survey of this type is directly and easily calculated by a straightforward 
application of random sample error formulae. An estimate of the stand- 
ard deviation among whole surveys, of 575, calculated from the average 
within-sample mean-square, is close to the estimate of 564 based on 
survey totals. 

Considering the systematic surveys again, it is well to examine the 
nature of the variation found in populations in place. It will be recog- 
nized first that subdivision of an area into blocks does not divide it into 
homogeneous strata. Rather, a variate changes continuously within a 
block, and from block to block. Generally, then, a variate measured 
first at a particular place and then at a place a differential distance 
away will undergo a differential change. The variate may, then, be 
considered as a continuous function of position and the problem of 
sampling reduces, sensibly, to one of curve fitting. Further, the correla- 
tion between observed values depends upon the distance between the 
points of observation. 

As with randomized block sampling, it is evident that since each mile 
is represented in the sample to the same extent as in the total area, any 
differences between miles contribute nothing to the sampling error. 
Also, it may be said that the observation in any mile is an estimate of 
any other observation in that mile. The problem may be restated, then, 
as: “Given the results of a survey, what is the error of predicting the 
results of another similar survey?” Considered in the light of linear 
least-squares regression theory, the measured length of line through a 
type, in any mile, may be considered the independent variable, and the 
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estimate for any other line (not measured), in the same mile, as the 
dependent variable. Then the variance of estimating any other line is 
the variance of estimating its value when only the mile within which it 
lies is known, multiplied by (1—r?), where r? is the square of the correla- 
tion coefficient of the line measured and the one to be estimated. The 


CHART III 
CORRELATION COEFFICIENTS OF LINES 1, 2, AND 3 MILES 
APART BASED ON LINES_NOS. 4 AND 6 
Correlation of observations on lines 4 and 6 with that on other lines less than 1 mile 
distance also are shown. 
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correlation coefficient depends, for its value, upon the distance between 
the measured and estimated line. 

This correlation coefficient can be estimated by calculating the cor- 
relation of lines measured at distances of one unit (mile in this case), 
two units, etc., anu plotting the correlation coefficient as the ordinate 
and the distance as the abscissa. As a control in drawing the curve, it is 
known that at zero distance the correlation coefficient is one. Expe- 
rience indicates the relationships to be exponential in form, hence plot- 
ting on semi-logarithmic paper is helpful. 

Chart III shows the calculated correlation coefficient of lines 1, 2, 
and 3 miles apart for systematic samples, lines Nos. 4 and 6 in every 
mile being the bases of the two parts of the figure. Values of the corre- 
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lation coefficient for lines less than one mile apart also are calculable 
in this study and are shown here. 
The variance of an observation at a single place is taken as the resid- 
ual mean-squared deviation from a polynomial fitted by least squares. 
Chart IV shows a plot of the observations of the 600 lines measured, 
the number of chains (66 feet) of cultivated land crossed by the line 


CHART IV 
NUMBER OF CHAINS (66 FEET) OF CULTIVATED LAND CROSSED BY EACH OF 
600 LINES PLOTTED OVER THE POSITION OF THE LINE (IN MILES 
FROM THE LEFT EDGE OF THE MAP) 
A polynomial of degree 6 has been fitted to the observations on line No. 16 in every mile by 
the method of least squares. Observations used are shown with heavier dots. 
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being plotted cn the position of the line in miles from the left edge of the 
map. A polynomial of degree 6 has been fitted to the observations on 
line No. 16 in every mile. 

Experience has shown also that, when enough terms in the poly- 
nomial have been used so that the introduction of terms of higher de- 
gree does not reduce the residual mean square significantly at the 5 
per cent level, the residual mean square is approximately equal to half 
the mean-squared successive difference of the original observations; 
i.e., 62/2, where ; 

D (Xin — Xi)? 


§2 = t=1 





n-—-1l 
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In this equation, the X; are the observations in miles 1, 2, 3,... , 
n—1, n, and n is the number of lines in the sample. 
The average squared error of estimating any survey from any ob- 
served survey is, then, 
Sr? = ns*(1 — 7°), 


where s?= residual mean square from a least-squares polynomial, 
n =the number of lines in a survey, 

and /7*=the estimated average of the squares of the correlation coef- 
ficients of a measured line with all lines within the mile 
within which the observation occurs. For example, if line 10 
is observed, 7? is the average squared correlation coefficient 
for lines between 0 and 10/32 mile and between 0 and 22/32 
mile. In practice this is most easily obtained by integrating 
under the curve r?=e—** between the limits 0 and 10/32, and, 
0 and 22/32. In this equation d is the distance between lines 
and k is a constant, equal to 2 log .rqa) where rq) is the 
correlation coefficient of lines one unit apart. 


TABLE I 


SUMMARY OF ESTIMATES OF STANDARD DEVIATIONS OF SAMPLE TOTALS FOR 
FOUR MAJOR TYPES CALCULATED BY METHODS DESCRIBED ABOVE 








. Randomized Completely 
Systematic block random 





Type oe 
S.D.7* /ns*(1 —r2)t 4/0) S.D.7§ | S.D.7] | S.D.w | S.D.7r** | S.Dwtt 





Chains Chains Chains Chains | Chains | Chains | Chains | Chains 
Cultivated 279 254 240 694 564 575 1701 1731 
Shrub 292 311 272 ~ 605 479 514 1610 1585 
Grass 103 154 152 310 203 257 418 499 
Woodland 111 130 133 217 156 188 224 254 





























* Standard deviation of totals of 20 randomly selected systematic samples. 

t Square root of average of estimates of variances of totals based on systematic sample observa- 
tions, estimated from the residual mean square from a polynomial fitted by least squares and the 
estimated average squared correlation coefficient of observed survey with all possible systematic 
surveys. 

t Same as ft with half the mean-squared successive difference (57/2) substituted for the residual 
mean square from a fitted polynomial. 

§ Square root of average variance of totals estimated on assumption that lines systematically 1 
mile apart were randomly located, two in each of fifteen 2-mile-wide blocks. 

|| Standard deviation of 20 sample totals, the samples consisting of 2 lines at random in each 
of 15 blocks. 

4 Estimate of standard deviation of sample totals (sampling according to |]) based upon the 
average within-sample mean square. 

** Standard deviation of 20 sample totals, the samples consisting each of 30 completely randomly 
selected lines. 

tt Estimate of ** based upon the average within-sample mean square. 
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Upon applying this formula to each of the systematic surveys, the 
standard deviations were found to range from 192 to 308, the standard 
deviation corresponding to the average variance being 254. This is to 
be compared with 279, the standard deviation among the survey totals. 
By comparison, the standard deviations of totals from the individual 


CHART V 


MAP SHOWING AREA IN DOUGLAS-FIR 
One of the 30 randomized block samples consisting of 2 lines selected at random in each of four 
20-mile-wide bldcks is shown. The block boundaries are indicated by the broken lines. 
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randomized block samples ranged from 431 to 776, that corresponding 
to the average variance being 575, as mentioned earlier. 

Values of “t” were calculated for each of the systematic surveys and 
a comparison was made with the theoretical distribution of “t.” A 
Chi-square value of 10.5 was found with 10 degrees of freedom. 

Table I summarizes the results for the four major types found in 
the area. 

For the Washington map, only the results for the Douglas-fir type 
will be mentioned. Here, totals from 30 sets of 8 lines uniformly 10 
miles apart showed a standard deviation of 785 chains. Randomized 
block samples based on 2 lines in each of four 20-mile-wide blocks 
showed a standard deviation of 1,062 chains. Chart V shows the dis- 
tribution of the Douglas-fir type in the area studied with one of 30 
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randomized block samples observed. The variance of the random sur- 
veys is seen to be about twice (i.e., 1.83 times) as large as for systematic 
surveys of the same intensity. Again, the mean values were very close 
to being 8,340 and 8,280 chains for the systematic and the randomized 
block surveys, respectively. 


With the aid of the more simply calculated formula, 
62 


S.D.7r= sll —#F) 


(where 6 and 7 have the same meaning as previously), the estimated 
standard deviations of totals were calculated from each of the 20 
systematic surveys. These ranged from 342 to 1,544 with that derived 
from the average variance being 955. This is to be compared with 785 
chains, the standard deviation of the actual totals. 

To summarize these tests of stratified random and systematic sur- 
veys of cover-type areas: 

1. For the material used in the test, stratified random surveys were 
only one-half to one-fourth as efficient as systematic surveys of the 
same intensity; 

2. If data taken systematically are used with random sample 
formulae, biased estimates of the sampling errors of totals or means 
result; 

3. It is demonstrated that with material of this kind, random sample 
formulae when applied to randomly selected observations yield de- 
pendable estimates of the sampling errors of totals; and 

4. It is demonstrated that from estimates of the correlation of 
measured and unmeasured lines dependable estimates of the sampling 
errors of systematic samples are obtained. 











SAMPLING WITH TRANSVERSE TRAVERSE LINES 


By Matcoum J. Prouproot* 
Bureau of the Census 


rapid reconnaissance technique was needed to sample the quantity 
and distribution of various types of land at lower cost than the expense 
of detailed mapping. In the years 1934-35, a traverse method of esti- 
mation was tried out with the purpose of determining the adequacy of 
sampling data so collected. 

Traverses spaced at various intervals were run across completed 
land-use maps and the lengths of various types of land were noted along 
each traverse. These lengths, converted to percentages, were compared 
with the percentages based on the areas taken by these types of land 
as determined by planimeter measurement. This is essentially the 
method introduced by Rosiwal! in 1898 for petrographic analysis. He 
computed the content of various mineral components of rocks by 
measuring the lengths occupied by these components along traverse 
lines. J. M. Trefethen of the University of Wisconsin first applied Rosi- 
wal’s method to geographical reconnaissance.? His conclusions coin- 
cided with Rosiwal’s, namely, that good results could be obtained when 
the total length of the traverse lines exceeds 100 times the average 
intercept of the field types traversed. 

The importance of this ratio is illustrated by Chart I. Here is shown 
a large square divided into fields of different types labelled from A to 
E. Table I A, pertaining to the entire area assumed to be sixteen square 
miles, gives in Column (1) the percentages taken by each field type as 
determined by planimeter measurement, and in Column (2) the per- 
centages taken by each field type as determined by measuring the total 
traverse intercepts shown by broken lines. Column (3) shows the devia- 
tions between Columns (1) and (2) which result in a root-mean-square 
error of 0.6. In this case the ratio of the total traverse length to average 
field intercept is 88 to 1. 

In contrast, Table I B, pertaining to the area of four square miles 
(the lower right-hand quarter of Chart I), sampled by the same trav- 


I THE ANALYsIS Of land use for the Tennessee Valley Authority, a 


* The author acknowledges with gratitude several helpful suggestions made by Dr. W. Edwards 
Deming, Staff Mathematician of the Bureau of the Census, and the invaluable assistance rendered by 
Louise Waldruff in these ardous statistical tasks. 

1 August Rosiwal, Ueber geometrische Gesteinsanalysen, Ein einfacher Weg zur Ziffermaessigen 
Feststellung des quantitaetsverhaeltnisses der Mineralbestandtheile gemengter Gesteine (Wien: Koenigliches 
Kristliches Reichsanstalt, 1898). For a short summary of this work in English, see Albert Johannsen, 
Manual of Petrographic Methods (McGraw-Hill Book Co., 1918), pp. 291-229. 

2 Paper read before the Association of American Geographers, St. Louis, Missouri, December 1935. 
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erse spacing shows nearly a fourfold increase in the root-mean-square 
error. This is accompanied by the less refined ratio of 23 to 1 although 
it will be noted by inspection that the average field intercept is ap- 
proximately the same as for the entire area. 

A higher ratio achieved for the same area by twice the frequency of 
lines (the additional lines are shown by dots) gives the results shown in 
Table I C. The ratio of total traverse length to average field intercept 


CHART I 























is now 46 to 1 and the root-mean-square error has decreased approxi- 
mately 50 per cent. 

From these comparisons it seems obvious that the greater the total 
traverse length and the narrower the spacing the smaller the resulting 
root-mean-square error in sampling the five field types. The root-mean- 
square errors obtained for these three examples, when plotted, give a 
suggestion of regularity. At infinitely close spacing there would be no 
error at all, hence the line must pass through the origin. 

The percentages of the total areas taken by the five field types con- 
tributing to the root-mean-square error do not differ widely; none pre- 
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dominates over the other and none is extremely small. It remains to 
see what happens to the error of the traverse method if the percentage 
taken by one of the field types is small. 

To obtain evidence on this question and in an effort to work out a 
traverse precision chart for reconnaissance field purposes, the ade- 


TABLE I 
COMPARISON OF TRAVERSE LENGTH WITH PLANIMETER MEASUREMENT 








A. Ratio of Total Traverse Length to Average Field Intercept is 88:1 





Planimetered field Traverse line Deviation 
Field type measurements measurements in per cent 
in per cent in per cent 

(1) (2) (3) 
A 18.3 19.2 0.9 
B 17.3 16.8 0.5 
C 30.8 31.3 0.8 
D 18.3 18.0 0.3 
E 15.3 14.7 0.6 
Root-mean-square error 0.6 


B. Ratio of Total Traverse Length to Average Field Intercept is 23:1 





A 20.5 22.0 1.5 
B 17.3 15.3 2.0 
Cc 28.5 28.4 0.1 
D 16.0 19.3 3.3 
E 17.7 15.0 2.7 

Root-mean-square error 2.2 

C. Ratio of Total Traverse Length to Average Field Intercept is 46:1 

A 20.5 19.6 0.9 
B 17.3 18.4 1.1 
Cc 28.5 28.6 0.1 
D 16.0 14.4 1.6 
E 17.7 19.0 1.3 

Root-mean-square error 3.9 





quacy of half- and quarter-mile spacings were tested out on an area 
comprising 134 square miles and ten different land types. The area 
stretched from the Holston River to the Smoky Mountains and the 
various land types ranged from 56 per cent down to 7 per cent of the 
total area. In carrying out the experiment, the total area was divided 
into 134 square miles, and the percentage of each square devoted to 
each of the ten land types was computed from their areas as obtained 
by planimeter measurement. The results obtained by the half- and 
quarter-mile traverse lines, likewise expressed by percentages devoted 
to each land type, were then compared with the percentages obtained 
by the planimeter. For any land type in the entire field there were thus 
134 percentages based on planimeter measurements, and 134 percent- 
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ages based on the traverse lines. These 134 pairs were plotted on a scat- 
ter diagram, for which a coefficient of correlation was computed. The 
results are shown in Table II and Chart II. The correlations are ade- 
quate, it will be observed, except for the land types having the lowest 
percentage. The crosses (showing the correlations for the 80 to 1 ratios) 
invariably fall above the dots (showing the correlations for the 49 to 
1 ratios), and moreover are more nearly uniform for the ten different 
percentages. 
CHART II 


CORRELATIONS BETWEEN TRAVERSE AND FULL COVERAGE MAPPING FOR 
TEN DIFFERENT LAND TYPES 
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These coefficients of correlation show that if the ratio of the total 
length of the traverse line to the average field intercept of all fields is 
sufficiently large, adequate results can be obtained for a field type com- 
prising any given percentage of the total area. However, for a field 
type comprising less than 5 per cent, this ratio apparently must be 
increased beyond 100 to 1, at which point the traverse field costs ap- 
proach the cost of full coverage. These indices therefore provide at 
least the beginning of a traverse precision chart for use in forecasting 
the accuracy of traverse sampling under varying field conditions.® 


* Had funds been available this chart might have been greatly extended and refined. As observed by 
Dr. Deming, it would have been highly desirable to study the reliability of the results under different 
ratios, using the average field intercept of each field type to the total traverse length. It is hoped that 
someone with similar data and the funds and facilities will undertake such investigations. 
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To show the utility of this chart, let us assume an area of approxi- 
mately 100 square miles. Limited funds and time render detail mapping 
impracticable, yet quantitative data pertaining to the proportional 


TABLE II 


CORRELATIONS BETWEEN TRAVERSE AND FULL COVERAGE MAPPING FOR TEN 
DIFFERENT LAND TYPES 








Arithmetic Arithmetic Cocfiicients 
of correlation 
mean of the mean of the 
: between the 
detailed traverse 
Type of land field data dutn, in traverse data 
be couctenn anal and the detailed 
- = field data 


(1) (2) (3) 





(Half-mile traverse spacing) 





Not eroding........... 56.1 58.1 .8873 
Ss 39.3 39.8 -9507 
Sheet erosion.......... 33.3 32.8 . 8323 
ee 32.2 31.1 - 8952 
10-20 per cent slopes. . . 27.1 26.9 .9578 
0-10 per cent slopes.... 23.3 23.8 -9564 
eee 21.6 21.8 .7452 
20-40 per cent slopes... 16.8 17.0 - 8832 
a eee fm 8.1 .6024 
ck ccwnewangien 7.4 7.9 . 8542 

Total traverse length 49 

Average field intercept 1 


(Quarter-mile traverse spacing) 


Pe NE, a scadeecwe 56.1 57.6 .9127 
ME i. cueansssawa 39.3 39.2 .9767 
Sheet erosion.......... 33.3 33.1 .9401 
CL cna eeiuen tens 32.2 31.9 .9657 
10-20 per cent slopes... 27.1 27.2 .9765 
0-10 per cent slopes. ... 23.3 23.6 . 9764 
oi aig ie meng 21.6 21.9 .9167 
20-40 per cent slopes... 16.8 17.7 .9375 
eer eee Pe 7.9 .8322 
PS bac wesendenns 7.4 7.6 -9253 


Total traverse length 80 
Average field intercept 1 








distribution of the major land types in this area are required. The 
distribution of these types is to be shown by square-mile units. To 
meet this objective, the following 10 steps can be taken: 

1. Map three widely, but evenly, spaced traverse lines across the 
area. If possible, run these lines transverse to the drainage pattern. 
From the field data thus obtained compute the average field intercept 
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of all fields and the percentage of the total area taken by each of the 
major land types. 

2. Compute the ratio of the total traverse length to the average field 
intercept. Refer to the traverse precision chart. If this ratio is suffi- 
ciently high, a high coefficient of correlation is indicated for each land 
type comprising more than 7 per cent of the total sample. If these con- 
ditions are met the traverse lines adequately sample the entire area for 
the major land types concerning which information is desired. 

3. Lay off a square-mile grid on a master base map of the area. If 
possible, place this grid so that one side or the other of each square 
mile lies transverse to the drainage pattern of the area. 

4. Using the determined average field intercept and the percentage 
figures for the major land types, determine from the traverse precision 
chart the traverse spacing which will provide the necessary ratio within 
each square mile to give satisfactory coefficients of correlation. 

5. Lay out the determined traverse spacing on the grid of the master 
base map. 

6. Transfer these traverse lines to field maps cut to an appropriate 
note-book size. 

7. Map the land types occurring along each traverse line. 

8. By means of the traverse field data thus obtained compute the 
percentage of each land type occurring within each square mile. 

9. Plot these percentage figures on the square-mile grid of an outline 
map for each land type. 

10. Draw isopleth lines of equal percentage for each land type, select- 
ing intervals suited to the purpose of the investigation. 











INDEX-NUMBER DIFFERENCES: GEOMETRIC MEANS 


By Irvine H. Sreceu 
U. S. Bureau of Labor Statistics 


HIS IS THE LAST of three papers in this JouRNAL' on the difference 

between indexes obtainable from the same set of relatives. The first 
two papers were concerned with the difference between the Paasche 
and Laspeyres formulas and the more general case of the difference be- 
tween any two arithmetic means. The area covered in those papers is 
rather broad, since «ny harmonic mean of relatives and any weighted 
or unweighted aggregative measure, including the Edgeworth, can also 
be expressed as an arithmetic mean of relatives.? With the extension of 
the discussion here to geometric means of relatives and to the “ideal” 
index, the exploration of the entire domain of practical index numbers 
is virtually completed. 

The methods employed in this paper will be similar to those em- 
ployed in the earlier ones. In addition, use will be made of the well- 
known facts that logarithms of geometric means reduce to linear forms 
and that arithmetic means are necessarily greater than similarly. 
weighted geometric means if not all of the relatives are equal.* The use 
of logarithms will make it most convenient for us to compare the 
magnitudes of the different index numbers in ratio form. 

First, we shall consider the ratio (R) between weighted and un- 
weighted geometric means (G,, and G, respectively) of the same set of n 
relatives (X;=g;'/g;): 


R = G,/G@ = (1X) 2"/(11X)"*. 
Since 
> wlog X Yow 
> w log X Dd log X | > log X n 


























—* > w n n>, w 

and 
dD wlog X Sow Wi+** Wr log Xi --- log X, | 
| Five x n Pit. | Doves a WP 


1 See Irving H. Siegel, “The Difference between the Paasche and Laspeyres Index-Number Formu- 
las,” this JouRNAL, September 1941, pp. 343-350, and “Further Notes on the Difference between Index- 


Number Formulas,” December 1941, pp. 519-524. 
2 It should be noted that unweighted arithmetic and harmonic means can also be written as weighted 


arithmetic means: =X/n=ZwX/Zw, where wi=1 =w;'(1/wi’), and n/2Z(1/X)==ZkX/=Zk, where 


kg =1/XG. 
3 Various algebraic devices leading to approximate expressions (e.g., the expansion of logarithms 


into series) could also have been used. 
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we have 
x > Wi WwW; log X; log ~ 
jal jai 1 1 1 1 
ee «i 
n>, w 


From the numerator of the last expression (which contains ,C2 deter- 
minant products), it is obvious that G,>G necessarily if the rank 
coefficient of correlation between the w; and the log X; equals +1, and 
that G.<G necessarily if the coefficient equals —1. It can also be 
shown that, since 


log R — Tw-logX Tw Tlogx N/ >, = f(r), 


or 
G. = 10/G, 


we have G,, = G according as Ty.1ogx = 0. These criteria‘ are analogous to 
those derived for the difference between a weighted arithmetic mean 


and an unweighted one. 
We shall now consider the relation between two differently weighted 


geometric means: 
R! = Gu/Go = (UX*')¥2"'/ (1X) V2, 
Since 
| > w' log X D> w’ | 
dw’ log X Di wlog X YwlogX Dow 














log R’ = rw’ a y wd w’ 
and 
dw’ logz Dow’ | _| w,’>++ wy’ ff] || log Xi--- log X,!| 
| Ste - > w =| Wi *** Wr I 1 --- 1 I? 
we have ; ; 
x > w;' Wj; log X; me 
i=l jeiz1| Wi Wj 1 1 











log R’ = ] 
“ YS w>d w’ 


4 Since an unweighted arithmetic mean may be written as a weighted one (see footnote 2), log R 
may also be expressed in terms of a weighted correlation coefficient and weighted standard deviations. 

5 W. V. Lovitt has investigated the sign of the difference between many index-number formulas, 
including geometric means (see Cowles Commission: Report of Third Annual Research Conference, 1937, 
pp. 85-87). His investigation was restricted, however, to indexes with the four sets of weights commonly 
applied to price relatives. Our discussion is more general and embraces Lovitt’s results as special cases. 

Incidentally, an arithmetic and a geometric mean with the same set of weights are improperly in- 
cluded in Lovitt's list of index pairs for which the inequality sign is “not permanently directed.” (An 
arithmetic and harmonic mean with the same set of weights are also improperly included.) 
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which may also be written as 
log R’ = Tw: (w’ /w)+logX Tw: (w’/w) Ow:loeX >, w/ > wy’ = f(r’). 


From the last two expressions, it is evident that G... >G, necessarily if 
the rank coefficient of correlation between the w,’/w; and the log X; 
equals +1; that G,,’<G, necessarily if the coefficient equals —1; 
and, more generally, that the sign of the difference between the two 
indexes corresponds to the sign of the weighted correlation coefficient 
Tw: (1"/w) -logx- These criteria are analogous to those shown in the earlier 
papers for the difference between two arithmetic means with dissimilar 
weights. 

Although an arithmetic mean is greater than a similarly weighted 
geometric mean (if some of the relatives are not equal), it is not neces- 
sarily greater than a geometric mean with different weights.* Designat- 
ing the arithmetic mean by M,, the geometric mean by G,,’, and the 
ratio by R’’, we have 


D wX , 
R” = My/Gw’ = [axe yi, 
> w 


whence 
log R’’ = log (42) _ wakes 
dw dw 
— Tw:(w'/w)+1oeX Fw:(w"e) Fw:loeX Do w/ dow’, 
or 
R” = M,,/G,-10/, 


Since log ([wX/=Zw)>=w log X/ Dw (if the X; are not all equal), it is 
clear that R’’>1 (i.e., My>G,') if rw:w0'/w) -togx £0. A positive correla- 
tion coefficient, however, is consistent with values of R’’ greater or less 
than unity. 

Like its components, the Paasche and Laspeyres indexes, the “ideal” 
index need not be greater than a weighted geometric mean of relatives. 
We first consider the relation between the “ideal” index, J =(PxLx)"”, 
and G,, = (I1X™)!/="; Py and Lx represent the Paasche and Laspeyres 
indexes of the X;=g,’/gi, and the m; are the weights in both the 


* Thus, an “unadjusted” industry production index of the kind shown by S. Fabricant, Output of 
Manufacturing Industries: 1899-1937 (1940), need not exceed an index computed for the same products 
by the Day-Thomas formula. The Fabricant measures are based on the Edgeworth formula; the Day- 
Thomas index for the time ¢; is a geometric mean of the production relatives weighted by averages of the 
money values for ¢; and the base period, to. 
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Laspeyres index and the geometric mean. If the X; are not all equal, it 
may readily be shown that J>G,, necessarily when Px2=Lx, or, more 
generally, when fm:y .1ogx 20. When the correlation coefficient is nega- 
tive, the inequality sign between the two indexes is not definitely di- 
rected. If, instead of Gn, we take Gny = (ILX™")/2"¥ (where the m:Y; 
are the weights in Px), then 1 >G,y when Lx = Px, or, more generally, 
when fm:y -logx [0. When the correlation coefficient is positive, the in- 
equality sign between the indexes is not definitely directed. If, fi- 
nally, we take G,=(IX~)'/=", then I>G, necessarily when both 
mY: (w/mY¥) -logX &NA 1m:(w/m) -logxX ATE Zero or negative. 

The difference between the “ideal” index and other measures con- 
sidered in the earlier papers requires little comment. Its relation to its 
components, the Paasche and Laspeyres indexes, is obvious; and Pro- 
fessor Fisher has conclusively shown how close it is to the Edgeworth 
index.’ Other cases may readily be investigated by the methods em- 
ployed here and in the earlier papers. 


7 See I. Fisher, Making of Index Numbers (1922), pp. 428-430; H. T. Davis and W. F. C. Nelson, 
Elements of Statistics (1937), pp. 111-112; and H. T. Davis, Theory of Econometrics (1941), p. 328. 











MATHEMATICAL OPERATIONS WITH PUNCHED CARDS* 


By J. C. McPHERSON 
International Business Machines Corporation 


N THE Past fifty years we have seen a very significant change in the 
I extent and complexity of computations required to apply mathe- 
matical formuiae to concrete situations. The early laws of physics, 
mechanics and chemistry were expressed very frequently by mathe- 
matical expressions for which the computations were fairly simple, and 
well within the range of the mechanical devices, including tables, then 
available. 

More recently we have been faced, as higher branches of mathe- 
matics have been called into use in solving other phenomena, with an 
immense increase in the amount of labor required to compute results 
under the mathematical expressions developed for this work. As an 
example, we might mention boundary value problems and the use of 
determinants in analyzing statistical correlations and for solving elec- 
trical networks. While the mathematical expressions are simple, the 
actual labor of carrying out the computations indicated can be a mat- 
ter of weeks of work. 

There thus arises a further course for mathematical study, the de- 
velopment of mathematical expressions or expansions whose computa- 
tion can be effected by the means at our disposal. This involves deter- 
mining the relative simplicity in use of various mathematical forms 
and the establishment of additional information regarding the accuracy 
and limits of error of various processes which can be carried out by the 
devices which we now have and can project. 

It is my purpose in this paper to describe briefly one of the more 
powerful but little used tools for extensive mathematical computation 
in order to point out its general function and its present application 
to computing problems. 

Punched-card tabulating machines, because they require no further 
manual work than the original entry of the problem on punched cards 
which afterwards actuate automatic machines, form the most powerful 
tool yet devised for the performance of mathematical computations. As 
yet the full capabilities of the automatic punched-card method have not 
been achieved in scientific fields, except in isolated instances in widely 
separated fields of activity. In bringing before you the results of the use 
of machines in the various fields of science, it is hoped that many addi- 


* A paper presented at the 103rd Annual Meeting of the American Statistical Association in joint 
session with the Institute of Mathematical Statistics, New York, December 28, 1941. 
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tional uses for machines will be developed and that thinking in machine 
terms will be greatly stimulated. 

Electrical punched-card accounting machines were developed pri- 
marily as a result of the analytical demands of the census and their 
great growth has been due to their usefulness in handling statistical 
and accounting procedures. This has led to the development of a spe- 
cific series of machines actuated by the punched-cards on a functional 
basis, i.e., each machine designed to perform a specific function and 
record its results in form for further automatic machine handling if 
desired. 

The basic feature of tabulating equipment is its ability to read 
punched holes and perform the computations indicated by the holes, 
recording the results in printed or punched-hole form for subsequent 
processing. These machines read a line at a time and with automatic 
reading of the cards goes a high speed of computing and handling of 
individual problems. Each machine has the ability to handle tabulating 
cards regardless of arrangement of data. 

There are some six punched-card machines whose usefulness in han- 
dling mathematical work has been demonstrated. For computing work, 
the automatic Multiplier is perhaps the most useful. This machine reads 
multiplication problems, performs the computations, and punches the 
answer back in the card on which the problem is stated. The Multiplier 
can also make cross additions or subtractions while multiplying, thus 
performing in a single step such operations as linear interpolation. Its 
operation is completely automatic and at a speed several times that of a 
clerk with a computing machine. On such work as 8 by 8 multiplica- 
tions, machine speed is 750 multiplications per hour and on smaller 
problems speeds up to 1,500 are obtained. 

Of next importance is the machine termed the Reproducer. This 
machine can transfer all or part of the punched information on one 
card or set of cards to another set of cards at the fixed speed of 100 
cards per minute. It is used, for example, in making copies of punched- 
card tables or parts of tables; for combining intermediate results com- 
puted on different sets of cards onto a single card for further process- 
ing; and for transferring data from one set of cards to another. The 
Reproducer is unique in its ability to copy information from one docu- 
ment to another. At 6,000 cards per hour we are able to reproduce, re- 
arrange, or extract information from punched-card records. 

An automatic Sorting Machine is available for rapidly rearranging a 
set of cards into another sequence or for bringing together all cards 
carrying a similar punched-card designation. It operates at 400 cards 
per minute. 
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Another card arranging machine is the Collator, which can inter- 
leave two separate files of cards into a single file; or select cards from 
the one file matching cards in the other file; or select all cards greater 
than a certain value, or less than a certain value, or falling between 
specified limits. This machine is used to select cards from a table file, 
and to refile the selected cards after use. 

The principal punched-card machine is called the Electric Account- 
ing Machine. It is a giant printing adding machine actuated by the pas- 
sage of the cards through a card feed at speeds as high as 150 cards a 
minute. It has the ability to add, subtract, or eliminate amounts 
punched in one or several fields of the cards passing through it. It 
automatically adds all cards having a common designation; and at the 
end of the group, which it determines automatically, it prints the total 
and punches a new card with the group designation and the group total 
on an automatic punch electrically connected to the accounting ma- 
chine. The machine has a maximum adding capacity of 80 digits. 
These adding wheels may be grouped at will into counters of varying 
size and several factors may be added simultaneously. This machine is 
the commercial version of Babbage’s “Differential Engine” capable of 
operation over any number of orders of differences, and a counter large 
enough to handle figures of practically any required size. 

There are several kinds of punches available for originally recording 
data on the punched cards, and they are designed for rapid operation. 
There are also Verifiers for checking punching by a second recording 
of the data and Interpreters for printing on the cards the information 
punched in the card. 

The use of the punched-card method for mathematical computation 
involves the use of one or more of the machines briefly described above. 
Some of the techniques require only the use of the Punches, Sorter and 
the Electric Accounting Machine, while other operations will call for 
the use at some point of all of the machines described above. 

Tables. An extremely important use for the punched-card equipment 
is the preparation of tables. The equipment is so powerful in this re- 
spect that it has been said that every computational problem should 
be examined to see to what extent special tables prepared by machine 
can be used in its solution. This statement applies not only to processes 
conducted entirely by the machine method, but also to extensive prob- 
lems where the special tables can be prepared by the machine and then 
used by computers in their further work. Even in so simple a thing as 
making a linear table, in one instance the first thousand multiples of 
each of four 20-figure numbers were produced in 4 hours, i.e., in about 
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one-tenth of the time in which they could have been copied by hand. 

The Electric Accounting Machine and Summary Punch are used in 
the preparation of tables using the method of differences. The process 
is covered in detail in a recent paper “On the Mechanical Tabulation 
of Polynomials,” appearing in the September, 1941, issue of the Annals 
of Mathematical Statistics. 

Interpolation of tables is frequently performed by aid of punched- 
card techniques. On systematic interpolation, a paper by L. J. Comrie 
in 1928 on “Construction of Tables by Interpolation” explains the 
preparation of subdivided tables by computing the last digit of each 
interpolated value exactly by the aid of a set of prepunched cards. 
These figures are then differenced until they are smooth and the entire 
value of the differences inferred. From these differences, the Electric 
Accounting Machine will then automatically construct the subdivided 
values of the function. 

Interpolation by use of the Lagrangian formulas can be readily ac- 
complished with the Multiplier and Electric Accounting Machine. The 
details of the machine process are fully described in Dr. Eckert’s book 
Punched Card Methods in Scientific Computation. 

The Electric Accounting Machine can be used to difference a 
punched-card table, directly computing and printing both first and 
second differences in a single, high-speed operation—2,400 per hour. 

One use of punched-card tables is for the automatic application of 
values of functions to problem cards. This is done by sorting the prob- 
lem cards in order according to the argument of the table and then 
automatically selecting the proper table cards with the Collator. The 
Reproducer then punches the data from the table onto the problem 
cards. If interpolation is required it is done by the Multiplier. 

The table-making process is useful in statistics for converting raw 
scores. After the mean and standard deviation have been determined a 
linear equation of the form Y=AX+B can be established and a 
punched-card table made for this function. The punched-card table is 
then sorted ahead of the raw score cards and the converted score gang 
punched into all cards. 

Harmonic Functions. Several of the most important uses of the ma- 
chines have been in connection with the synthesis and analysis of 
harmonic functions. A technical paper on the use of Hollerith machines 
for synthesis of harmonic series appeared in 1932. This paper, appearing 
in the Monthly Notices of the Royal Astronomical Society, presented by 
Mr. L. J. Comrie, described the method by which the many coefficients 
in Brown’s tables of the moon were combined into an orbit for the 
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moon carried out to the year 2,000. This was the most difficult case of 
harmonic synthesis where the periods of the various components were 
not commensurable. This paper explains the method by which punched- 
card tables were prepared which took this fact into account and per- 
mitted the synthesis to proceed on a mechanical basis. 

The use of cards for harmonic synthesis is indicated by the recurrent 
use of the same component values in different arrangements as the pe- 
riods repeat. The preliminary card preparation consists in determining 
the interval of the desired synthesis and then computing the values of 
each term throughout its period for values of the argument at the de- 
sired interval. These values can be computed initially with the aid of a 
punched-card table and the multiplying punch. 

The cards for each term are then placed in stacks on a table and the 
top card of each stack picked up and totaled in the Electric Account- 
ing Machine. Checks on the proper selection of cards can be secured 
by adding card numbers as well as the coefficients. This addition can 
be performed automatically by placing a special card ahead of each 
group of cards for a distinct argument. 

After tabulation, the cards for the various terms are separated by a 
run through the Sorter and replaced behind the unused cards for each 
respective term. 

For repeated synthesis of Fourier Series, prepunched decks which can 

be combined to produce any amplitude of each frequency are used. Such 
a deck has been used extensively at California Institute of Technology 
and computes points at intervals of 1/500 of a circle and goes up to a 
frequency of 30. 
_ In the analysis of harmonic series as distinct from synthesis, the 
Multiplier and Tabulator combine to give a most effective method in 
reducing the manual effort involved. These analyses are made with the 
aid of a set of prepunched cards. These cards are prepunched with the 
value of the trigonometric function and a pattern showing in successive 
columns whether the product formed on that card is to be added, sub- 
tracted or eliminated in computing the successive amplitude coeffi- 
cients. 

One interesting possibility in the use of cards for harmonic synthesis 
which was suggested by Comrie is that a whole series of harmonic syn- 
theses involving the same periods but varying amplitudes can be tabu- 
lated from a double set of cards. By combining the original and dupli- 
cate sets of cards out of phase sufficiently, any desired amplitude from 
zero to twice the amplitude of the set may be produced. 

Progressive Digiting. Tabulating machines have been recognized quite 
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widely as an efficient means of computing the sums of products needed 
in computation of multiple correlations and least square trend lines and 
other statistical and computational problems. This work can be per- 
formed with a sorter and the Electric Accounting Machine. The process 
is such that a number of cross products may be handled simultaneously 
in separate counter groups of the Electric Accounting Machine. 

All the factors which are to be multiplied together are punched on 
cards, each card carrying the related data of a single case. The sums of 
the squares and of the cross products are obtained by a method of 
multiplication by addition. This process handles one multiplier digit at 
a time and is extremely rapid. Comrie states that “on one occasion 
25,000 products of three-figure numbers were formed and added in 
about three hours.” This method of multiplication is probably the fast- 
est known today. Multiplication takes place at the same speed as addi- 
tion and many products may be accummulated at one time. 

A development of this method of multiplying by addition is now in 
use where the multiplication is done without sorting. In this process an 
analyzing device on the Electric Accounting Machine analyzes the 
digits of the multiplier column, adding or subtracting the multiplicand 
into one or more of three counters assigned values of 1, 3 and 5. For 
example, the digit 4 adds into counters 1 and 3, the digit 6 into counters 
1 and 5, and digit 7 adds into counters 3 and 5 and is subtracted in 
counter 1, etc. These totals are summary punched and the 5 counter 
multiplied by 5; 3 counter by 3, and sum of 5, 3, 1 cross footed on 
multiplier. 

Evaluation of Determinants. The solution of determinants, particu- 
larly of the higher orders, is one which is particularly burdensome when 
done manually. It is a problem to which the punched-card method has 
been applied for elimination of the manual labor. The method involves 
the use of the Multiplier, the Reproducer, and Sorter, and parallels the 
short method of single division usually followed under manual methods. 

A card is punched for each element of the determinant identified by 
its row and column. The reciprocal of the element a; is punched in 
card a; and used as a group multiplier for all the cards in row 1. The 
cards for the remaining rows are offset gang punched transferring the 
value of the element of column 1 to the remaining cards for each row. 
The cards for row 1 are then sorted as group multiplier cards ahead of 
the remaining cards by column. The cards are then group multiplied 
for the reduction a—b Xc. This routine has reduced the determinant by 
one order and is repeated until the determinant can be evaluated at 


sight. 
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This method is entirely general, very rapid, and involves a limited 
number of simple machine operations. The identical process can be 
used for the solution of simultaneous equations. 

Thus far we have discussed basic principles of punched-card meth- 
ods for performing mathematical computations. In addition to the spe- 
cific purposes of these basic principles there is a wide variety of prob- 
lems which can be solved by combinations or repeated application of 
the basic punched-card steps. I will but briefly point out a few such 
problems which have been successfully attacked and solved in this 
manner. 

One of the outstanding applications of the punched-card method is 
its application to the solution of differential equations which was de- 
veloped by Dr. W. J. Eckert at Columbia. This machine procedure is 
extremely effective and we expect to see its application carried into 
many fields. Of particular significance in connection with this use of 
machines as compared with other methods is the degree of accuracy 
which can be established. 

A simple but useful application of the Electric Accounting Machine 
is the preparation of scatter diagrams. The punched-card technique for 
factor analysis has been worked and successfully applied in Chicago. 

Other major machine applications have been in the evaluation of 
formulae, for instance, the transformation of spherical coordinates 
into rectangular coordinates. 

Another very extensive computation now being performed is a bivari- 
ate linear interpolation where a series of multiplier operations deter- 
mines the weight to be given each of the four surrounding known values 
of the function and performs the final evaluation. 

It should be clear that punched-card methods may be applied to 
many computational problems extensive enough to warrant mechaniza- 
tion. The three fundamental mathematical operations into which al- 
most all computational problems can be transformed, namely, evalua- 
tion of determinants, evaluation of harmonic series, and evaluation of 
polynomials can be performed by these methods. Much of the pre- 
liminary work in applying punched-card methods to scientific computa- 
tion has already been done by pioneers in this field. The task now be- 
fore us is to exploit intensively the new methods and more efficient 
tools they have tested for us. 








A COMPUTATIONAL SHORT CUT FOR REGRES- 
SIONS BASED ON UNEQUAL FREQUENCIES! 


By Marion M. SaAnDOMIRE 
Office of the Inspector-General, War Department 


HE REGRESSION of a set of quantities with unequal frequencies on 
ie cea equally spaced values of the independent variable 
may be computed by the ingenious method given in R. A. Fisher’s 
Statistical Methods for Research Workers, Ed. 7, pp. 168-176. This 
method, in that it requires a number of successive additions, involves 
a great amount of writing. While it is true, as stated on page 170, 
“that much labour is saved by choosing a ‘working zero’,” it is of inter- 
est to eliminate all manual labor possible. 

Since a method making use of orthogonal polynomials is to be pre- 
ferred to one requiring cumulative frequencies, it is assumed that or- 
thogonal polynomials are used in problems with equal frequencies. With 
unequal frequencies, short series with low degree regressions might pos- 
sibly be written out without too much effort. With a large number of 
very long sets of observations, punch card equipment might be em- 
ployed for obtaining the cumulative frequencies. However, without this 
equipment a calculating machine, especially one with multiplying keys, 
may be put to use most advantageously. 

The following table gives factors that may be used with the values of 
the dependent variable and the frequencies to obtain successive cu- 
mulative sums without the intermediate recording that is shown in 
Fisher’s tables 30.3 and 30.4. The method of constructing the table is 
apparent from the numbers themselves. A formal illustration with a 
simplified case as, for example, with four quantities above the working 
zero and four below follows. By writing out the successive additions 
above and below the zero line, quantities are obtained which are seen 
to be sums of products of the original values and the factors shown in 
the table. 








0 1 2 
d d d d 
c e+d c+2d c+3d 
b b+c+d b+2c+3d b+3c+6d 
a a+b+c+d a+2b+3c+4d a+3b+6c +10d 
A A+B+C+D 
B B+C+D B+2C+3D 
Cc C+D C+2D C+3D 
D D D D 





1 The procedure described here was designed and applied when the writer was connected with the 
U. 8S. Forest Service. 
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MULTIPLIERS FOR OBTAINING CUMULATIVE SUMS 














0 1 2 3 4 5 6 
1 17 153 969 4845 20349 74613 
1 16 136 816 3876 15504 54264 
1 15 120 680 3060 11628 38760 
1 14 105 560 2380 8568 27132 
1 13 91 455 1820 6188 18564 
1 12 78 364 1365 4368 12376 
1 11 66 286 1001 3003 8008 
1 10 55 220 715 2002 5005 
1 9 45 165 495 1287 3003 
1 36 120 330 792 1716 
1 7 28 84 210 462 924 
1 6 21 56 126 252 462 
1 5 15 35 79 126 210 
1 4 10 20 35 56 84 
1 3 6 10 15 21 28 
1 2 3 4 5 6 7 
1 1 1 1 1 1 1 
1 

1 1 

1 2 1 

1 3 3 1 

1 4 6 4 1 

1 5 10 10 5 1 

1 6 15 20 15 6 1 
1 7 21 35 35 21 7 
1 8 28 56 70 56 28 
1 9 36 84 126 126 84 
1 10 45 120 210 252 210 
1 11 55 165 330 462 462 
1 12 66 220 495 792 924 
1 13 78 286 715 1287 1716 
1 14 91 364 1001 2002 3003 
1 15 105 455 1365 3003 5005 
1 16 120 560 1820 4368 8008 





The appearance of these factors from this arithmetic process formed 
the basis for proof, originally,? for this summation method of obtaining 
requisite products involving powers of the independent variable. This 
example is sufficiently general to indicate the method of constructing 
the entire table (which might be regarded as cumulations of unit fre- 
quencies). 

The final quantities obtained in the forward and backward summing 
may be recorded, and then their sums and differences computed as in- 
dicated by Fisher (p. 171). However, it is readily seen that these results 
may be obtained directly in the calculating machine by taking the sign 


2 Proposed by G. F. Hardy and described by W. P. Elderton in Chap. 3 of his Frequency Curves and 
Correlation, 1938. 
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of the factors as negative in the odd columns above the zero line, and 
recording only the final result for each column. Using Fisher’s data, for 
example, we obtain: 


1°6+ 1°16+ °* y ‘ +1°71+1°87+1°66+1°54+°**+ 1°8+ 1°6= 988 
9°6+ 816+ ° 4 : +1°71 —1°66—2°54-—-**— 9°8— 10°6= 991 
36°6+28°16+ ~* : +1°98 +1°66+3°54+--+*+ 45°8+ 55°6= 9054 
84°6+56°16+°** +1°84 —1°66—4:°54—-*-* —165- 8 —220° 6 = —3640 


and so on, for the remaining quantities. 

If the factors are used as multipliers and accumulated, a check on 
their sum is obtained in the upper part of the table from the value found 
in the next column on the same line as the last factor used. In the lower 
part of the table, reference may be made to the above-mentioned value 
or to the value in the next column and one line below. 

The table is written out in full to give the proper alignment as the 
columns drop down, and to have the factors arranged in the same order 
as the values themselves usually appear. The table is very easily ex- 
tended, as needed, for either an increased range of the independent 
variable or for a higher degree curve. 

The greatest advantage is obtained from this table of factors by 
transferring it to paper ruled in the same way as that on which the lists 
of values and frequencies are recorded. Alignment of the two permits 
the accumulation of cross products with little effort. 























OBTAINING DIFFERENCES FROM PUNCHED CARDS 


By Harry PELLE HARTKEMEIER AND HERMAN E. MILLER 
University of Missouri 


HEN statistical tables are available in the form of punched cards, 

\\) it is frequently desirable to obtain first differences from such a 
table or set of punched cards. One method by which first differences 
can be obtained has been presented by W. J. Eckert in Punched Card 
Methods in Scientific Computation.! However, this method involves 
running the cards through the numerical tabulating machine twice be- 
cause cards are tabulated in pairs, the first card in each pair being sub- 
tracted from the second card. An X must be punched in alternate cards. 
On the second run of the cards, the first card is removed in order to pair 
the second card with the third card, etc. A change must be made in the 
wiring of the X-distributor on the second run. Special wiring was used 
by Professor Eckert to obtain a break in control after every other card. 

It is possible to obtain first differences on one run of the cards with- 
out any special control wiring when Type 405, the Alphabetic Account- 
ing Machine? is used. Chart I shows the wiring necessary, and parts of 
the report appear in Chart IT. Card columns 3, 4, 5, and 6 contain the 
argument which in this case is z/o from 0 to 6.000. Card columns 7 
through 13 contain the function,’ which is the area under the normal 
curve between the arithmetic mean and the value of z/c. 

The value of the function punched in each card is added in one 
counter and subtracted from another counter. The value of the function 
punched in the next card is registered in the same two counters with 
the sign reversed by using the X punched in alternate cards to control 
an X-distributor. In this particular set of cards the X’s were punched 
above the even numbers of the digit in column 6. All cards with odd 
numbers in column 6 are NO-X cards. 

The X’s punched in column 6 are also used to obtain control breaks 
after each card and to control the selector which prints totals alternately 
from the two counters. The class of total impulse is wired through the 
total selector so that the counters will be cleared alternately at the 
proper time. This wiring illustrates the great value of controlling total 
printing or counter clearance by wiring rather than by switches, as is 
done on Types 285 and 297. 

1 This book was reviewed in this Journat, Volume 36, Number 214, June, 1941, pp. 314-315. 
2 For detailed instructions and illustrations of how to operate this machine, see Principles of Punch- 
Card Machine Operation, by Harry Pelle Hartkemeier, New York, Thomas Y. Crowell Co., 1942. 


# Quintic interpolation was used to compute values of the function in between those given in Karl 
Pearson's Tables for Statisticians and Biometricians. 
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The card count connection to type bar number 1 for listing numerical 
data causes zeros to be printed by type bars 2, 3, and 4. The card count 
connection to a counter enables us to register something in number 16 
numerical type bar for total printing. We prevent the type bars actu- 
ated by the card count mechanism from printing by using hammerlock 
levers 1 and 16. 

If second differences are desired, the first differences can be summary 
punched and used to obtain second differences. A check upon the 
punching can be obtained by noticing that the sum of the first differ- 
ences approaches .5000000 as a limit. Counter coupling is used in Chart 
I because some Type 405 machines have only the first 32 counters 
wired. 

CHART II 


PARTS OF THE PRINTED REPORT OBTAINED FROM THE ALPHABETIC 
ACCOUNTING MACHINE 











2 Area First 

Difference 
0.000 -0000000 .0000000 
0.001 .0003989 -0003989 
0.002 .0007979 -0003990 
0.003 .0011968 .0003989 
0.004 -0015958 .0003990 
0.005 -0019947 -0603989 
0.006 -0023937 .0003990 
0.007 .0027926 -0003989 
0.008 .0031915 -0003989 
0.009 -0035905 .0003990 
0.010 .0039894 .0003989 
0.011 .0043883 .0003989 
0.012 .0047872 .0003989 
0.013 -0051861 .0003989 
0.737 . 2694388 -0003041 
0.738 . 2697428 .0003040 
0.739 - 2700465 -0003037 
0.740 - 2703500 -0003035 
0.741 . 2706533 -0003033 
0.742 . 2709563 -0003030 
0.743 - 2712591 .0003028 
0.744 - 2715617 .0003026 
0.745 . 2718641 -0003024 
0.746 . 2721663 .0003022 
0.747 . 2724682 -0003019 
0.748 . 2727699 -0003017 
0.749 . 2730714 -0003015 
0. 


750 - 2733726 -0003012 
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GLENN E. McLAavuGuauin 
Review Editor 


Agricultural Price Analysis, by Geoffrey 8S. Shepherd. Ames, Iowa: The 
Iowa State College Press. 1941. viii, 402 pp. $3.75. 


For twenty years agricultural economists have been building up a tech- 
nique of quantitative measurements of commodity supply-demand relations, 
almost unknown to economic workers in other fields. These studies have 
served as technical foundations and guides for vast public action programs 
in the field of farm production adjustment, price control, and agricultural 
planning. The number of published “price analysis” studies runs into the 
high hundreds, if not into the thousands. The books in this field have been 
few and far between, with Thomsen’s elementary Agricultural Prices' and 
Henry Schultz’s monumental Theory and Measurement of Demand? the major 
printed texts to record the generation of development that followed the first 
explorations of Henry L. Moore,’ a generation ago. 

Shepherd presents a treatment of the subject that is sophisticated from 
the economist’s point of view. It ties back firmly to the institutional facts 
of market organization and to modern economic theory. It carries on from 
the price analyses themselves to their meaning in analyzing social action 
programs (such as the A.A.A., the Food Stamp Plan, milk price plans, and 
marketing agreements, and pro-rates) and even further to a consideration of 
the basic problem of how full employment can be maintained in a society 
where monopolistic competition is the rule and effective competition the 
exception. 

After the excellent discussion of the market institutions within which 
farm prices are recorded, Shepherd gives a step by step discussion of the 
basic economic phases of price analysis. In this section his approach in terms 
of supply and demand curves and of their elasticities leads perhaps to an 
over-emphasis on those economic abstractions rather than to analysis of the 
fundamental problems of explaining changes in prices, changes in production, 
and changes in supply offered. The novice might wish also here for some 
clearer hints and instructions for actual research operations. For example, 
the chapter on elasticity of supply gives no clear statement of the fact that 
usually only by taking changes in production, rather than absolute amounts, 
has the agricultural supply curve been found measurable for successive pro- 
duction periods. Subsequent sections however, are quite practical. Statisti- 
cians will find of special interest the clear discussions of the use of multiple 
correlation and the graphic short-cut method in price analysis in Chapter 18, 
and of the significance of price analysis results in Chapter 21. 

This is a useful and stimulating book, and one that may be an eye-opener 

1F, L. Thomsen, Agricultural Prices, McGraw-Hill, 1936. 


2? Henry Schultz, Theory and Measurement of Demand, University of Chicago Press, 1938. 
3 Henry L. Moore, Economic Cycles, Their Law and Cause, Macmillan, 1917. 
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to economists in other fields who wonder how economic functional relations 
can be measured in quantitative terms. It summarizes an area of work that 
is moving continuously, though fumblingly at times, to provide economics 
with some of the same quantitative bases that the natural sciences have. 
Morpecal EzeKIEL 
U. S. Department of Agriculture 


A Theoretical Analysis of Imperfect Competition with Special Application to 
the Agricultural Industries, by William H. Nicholls. Ames, Iowa: The 
Iowa State College Press. 1941. xiv, 384 pp. $3.75. 


Dr. Nicholls undertakes in this book to present the thoery of monopolistic 
or imperfect competition in an agricultural setting. It is essentially a text- 
book for use in graduate classes in marketing and prices; that is, it is not a 
monograph in the sense of being an exhaustive inductive study of a particu- 
lar situation. As a theoretical treatise it is not so much an attempt to carry 
further the Chamberlin-Robinson type of analysis as to restate it and, as 
Nicholls puts it, to assume the role of the tool-adapter. 

The approach used is one obviously needed in graduate instruction for 
students of agricultural marketing. Many researchers will find very helpful 
the carefully developed theoretical framework which the book presents. 
Marketing textbooks very generally have stressed the institutional organiza- 
tion of the markets and the functions performed. Empirical researches more 
often than not have dealt with agricultural marketing as a production proc- 
ess rather than as a bargaining procedure. Prices tend to be taken as faits 
accomplis related to the supplies offered by original producers and the de- 
mands arising from ultimate consumers. Emphasis is thus on basic supply 
and ultimate demand, and upon efficiency in performance of the physical 
functions of marketing. In this connection, consideration is likely to be 
given to scale of operations in the handling agency as it affects the cost of 
processing and handling. Dr. Nicholls places emphasis upon the price-mak- 
ing aspect of marketing and possibly in some measure loses sight of social 
gains as well as losses which may arise from large-scale market operations. 
Such variations in cost of operation are, to be sure, included in his graphs but 
in exceedingly schematic form. They are not much discussed. 

The fundamental interest of producers is to gain as large a portion as 
possible of what ultimate consumers can be induced to pay; of consumers to 
get their product at as little as possible above the amounts needed to induce 
production. Large-scale operations probably produce larger economies than 
most schematic presentations indicate. Granting, then, a tendency for this 
larger scale operation to open the way to certain monopolistic practices, the 
fundamental problems become those of appraising the net effects of few 
firms and large-scale operations and of ascertaining what measure of com- 
petition between them can and should be brought about, and how it is to be 
done. Nicholls recognizes this problem in a brief comment in Chapter 9 but 
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sets it aside rather quickly. He has, probably wisely, stuck to his main task, 
though this larger over-all problem will eventually need fuller exploration 
than it has had thus far. 

Space available for this review does not permit extended comment on the 
carefully developed analysis covered. The book presents a new version of the 
Chamberlin-Robinson approach, and carries forward the attempt to apply 
the abstract theory developed by these earlier writers. The examples taken 
are more illustrative of what might be the competitive situation, or lack of 
it, than proof of what is. Many instructors and researchers, however, will 
find it a very valuable addition to their teaching and research materials. 

Not a few students of agricultural marketing problems will no doubt 
object that the refinements described run much beyond the actual knowledge 
available even to monopolists and oligopolists, and that the many rigid 
assumptions necessitated by this type of analysis remove it from reality to 
an extent that vitiates its usefulness. Nevertheless Dr. Nicholls has made 
real progress in the direction of adequate analysis of these complex problems. 
It is a scholarly and competent exposition. 

Murray R. BENEDICT 


University of California, Berkeley 


Wheat Studies of the Food Research Institute, Stanford University, Cali- 
fornia. “Wheat in National Diets,” by M. K. Bennett. Volume XVIII. 
No. 2. October 1941. pp. 37-76. $1.00. 


On 40 large pages the author succeeded to cover very much ground. Not 
only did he give a comprehensive picture of the role of wheat in the diet of as 
many as 52 countries, but in addition there is a computation of the cereal- 
potato ratio to total food consumption in calories. The reviewer found this 
supplement of especially great interest. 

Bennett’s computations necessarily had to be based on data of varying 
degree of exactness, representativeness, and comparability. He had, for 
example, to use data on disappearance rather than actual consumption, i.e., 
his data include waste in trade channels and homes. It is common practice 
to make the necessary qualifications in footnotes which many do not care 
to read and which are mostly disregarded in reproductions of the data. You 
cannot do this with Bennett’s data. His less reliable figures on total food dis- 
appearance in calories and on the cereal-potato ratio to total food are pre- 
sented either with intervals of 200 calories per day for total food disappear- 
ance or in percentage ranges such as 50-60, 60-70, and the like for the cereal- 
potato ratios. These intervals and ranges are inserted in both the tables and 
charts and no one can fail to neglect them. 

Bennett’s data support by broad statistical analysis the commonly known 
fact that with due allowance for differences in average weight of the people 
and in climate, the variations in total intake of calories in the form of food 
are relatively small. Except for several Asiatic countries, total disappearance 
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of food is within the rather narrow range of 3,600 to 4,000 calories per adult 
male. However, the proportion of cereals and potatoes in total calories dis- 
appeared varies from 30—40 in the United States, Canada, United Kingdom, 
Sweden, Switzerland, Australia and New Zealand to as much as 80-90 in 
Russia, Rumania, several Asiatic countries, including China and India, and 
also in a few African countries. 

The proportion of wheat in total calories disappeared naturally varies 
even much more than that of calories in cereals and potatoes, from almost 
half of all calories in Bulgaria to nothing in Nigeria and practically nothing 
in Java, French Indo-China, and several other countries. 

The last section of the study compares the changes in per capita disap- 
pearance of wheat flour from 1923-28 to 1933-38 in the various countries. 
More countries show declines than rises and all the most important wheat 
consuming countries, such as the United States, United Kingdom, France, 
Italy, and Germany, are among those showing declines. (No such comparison 
could be made for Russia.) 

The only doubt the reviewer has is the wisdom of computing the ratio of 
per-capita wheat flour disappearance to per-capita disappearance of total 
food in calories in Chart 2, by using not only the caloric value of actually dis- 
appeared total food but also a constant figure of 3,000 calories. 

N. JASNY 


Washington, D. C. 


Statistical Methods for Research Workers (Eighth Edition), by R. A. Fisher. 

Edinburgh: Oliver and Boyd. 1941. xv, 344 pp. 16 shillings. 

In preparing the eighth edition of this standard work, Professor Fisher 
has followed his usual custom of adding new sections describing more re- 
cently developed techniques, while making little or no change in the material 
contained in previous editions. The principal addition consists of an exten- 
sion of the section on the use of discriminant functions. A discriminant func- 
tion is that linear compound of a set of measurements which best distin- 
guishes between a number of groups, in the sense that the ratio of the mean 
square between groups to the mean square within groups is maximized. In 
the seventh edition Fisher outlined the calculations required to construct 
the discriminant function, and described a number of practical applications 
of this new tool. The present edition contains a numerical example illustrat- 
ing an approximate test of significance of the difference between the “best” 
discriminant and the discriminant obtained from any given linear compound 
of the measurements. Thus we may test whether the “best’’ discriminant 
differs significantly from an index which might be constructed either from 
theoretical considerations or by assigning on inspection an arbitrary set of 
weights to the various measurements. This test should be illuminating to 
those readers who wish to understand the relation between the discriminant 
function approach and previous methods of handling the same type of prob- 
lem. 
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The preface contains an interesting discussion of the extent to which the 
book is dated. In the author’s opinion, this is most evident in the order of 
presentation of the topics; in particular, he suggests that the analysis of 
variance might profitably be presented sooner and developed more fully. 
As a step in this direction, paragraphs have been inserted in this edition 
indicating the relation of the t-test to the analysis of variance tests of sig- 
nificance, though the detailed discussion of the analysis of variance still 
follows the section on intra-class correlations, and Professor Fisher excuses 
himself from the heavy task of a complete revision. Owing to the author’s 
insistence on giving exact formulae rather than approximations wherever 
possible, the book remains as up-to-date a laboratory manual as any in the 
literature. From the point of view of its use as a text-book, however, the 
arguments in favor of a rearrangement will increase in force as the editions 
grow with the years. 

Presumably on account of wartime restrictions on the use of paper, the 
type has been reset so that the lines of print are closer together. The size 
of type is unchanged, however, and the book is still very comfortable to the 
eye. Some printing errors have unfortunately crept in during the resetting; 
mistakes were noted in the formula for k, on p. 70, for a on p. 145, for z on 
p. 191, for the value given to the formal variate y in the group of males on 
p. 280, and for the additional information on p. 320. 

W. G. Cocuran 


Iowa State College 


Factor Analysis, by Karl J. Holzinger and Harry H. Harman. Chicago: 
University of Chicago Press. 1941. xii, 417 pp. $5.00. 


Although factor analysis had its greatest development in the field of psy- 
chology, its application to many other branches of the social and physical 
sciences is becoming so wide-spread that a book dealing with the general 
aspects of this subject is not only timely but highly welcome. 

The book under review is a valuable contribution for several other reasons: 
First, the authors have made a determined attempt to define the scope and 
limitations of the technique of factor analysis; second, they have synthesized 
the major known methods of factorial decomposition; and third, they have 
developed the subject matter with great clarity and with much detail so that 
the book can be followed without too much difficulty by people with limited 
mathematical training. 

Except for a few minor omissions, the book is self-contained. Not only does 
it cover all of the important aspects of factor analysis that have been con- 
sidered in the past few decades in the literature, but each topic is thoroughly 
digested with detailed numerical examples, detailed methods of computa- 
tion, and complete proofs given either in the body of the text or in an ap- 
pendix. 

From the point of view of the uninitiated, the mathematical and geo- 
metrical treatment in this book of the subject of factor analysis is the best 
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that has yet appeared. The main contribution of this book, however, lies not 
so much in the algebraic and geometrical treatment as in the handling of the 
subject matter as a tool of research. 

As is well known and is frequently emphasized by the authors, there exists 
no unique method of resolving a set of statistical variables into factors. 
(From an algebraic point of view, this is another way of saying that there 
exists no unique method of reducing a quadratic form into a sum of squares.) 
It follows therefore that the criterion for what the authors called a “pre- 
ferred” system of factors must be based on considerations other than mathe- 
matical. 

From a strictly scientific point of view, the ideal preferred system of fac- 
tors is one which is deduced from an a priori hypothesis concerning the 
characteristics which the variables under analysis are supposed to measure. 
Once a system is derived in this manner, statistical tools can then be de- 
veloped to test the agreement between the hypothesis and the observed 
facts. An outstanding example of this type of factor analysis is given by 
Spearman’s “Bi-Factor” theory of intelligence. 

Generally speaking, however, the technique of factor analysis has hereto- 
fore been used more as a tool for discovering hypotheses than for testing 
hypotheses which have been arrived at on a priori grounds. Such an applica- 
tion of this technique is necessarily full of pitfalls. The authors of this book 
have given a great deal of attention to this and similar problems and have 
arrived at several rational methods of judging the system of factors which is 
to be preferred. 

Insofar as factor analysis is a tool of statistical analysis, the problem of the 
significance of the results obtained from a sample of observations assumes 
great importance. Unfortunately the problems involved in the development 
of tests of significance are rather complicated, and, to date, only scanty 
progress has been made in this field. The book, for example, contains several 
approximations to the standard errors of the quantities involved and some 
indications of tests of significance are given. However, these are at best 
rough and cannot be relied upon always to give satisfactory answers. Thus, 
at the present stage of development, an investigator must depend to a large 
extent on his own judgment and a knowledge of the variables analyzed. The 
same holds true in regard to the problem of estimating communalities. These 
communalities are in a sense the cornerstone of the whole technique of fac- 
tor analysis as developed in this book, and several methods are described for 
their estimation. If used with discrimination and mature judgment, these 
methods will lead to fruitful and sensible results. This fact is well exemplified 
by the manner in which the authors themselves have handled them. How- 
ever in the final analysis, the methods proposed are mostly based on rule-of- 
the-thumb criteria and will remain as such until factor analysis passes from 
a tool for hit-and-miss experimentation to a tool for testing well-thought-out 
theories concerning the structure of the variables analyzed. 

M. A. GrrsHIck 


U. 8. Department of Agriculture 
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The Analysis of Economic Time Series, by Harold T. Davis. Bloomington, 
Indiana: The Principia Press, Inc. 1941. xiv, 620 pp. $5.00. 


The difficult problem of analyzing and interpreting economic time series 
has received its greatest attention during the past decade. Professor Davis, 
therefore, has rendered a distinct service in setting forth in this volume the 
present status of research in this field. The book is concerned chiefly with 
developing and analyzing the components of economic time series such as 
trends and cycles. In this connection considerable attention is given to the 
technique of harmonic analysis and its application to economic time series. 
The measurement of the energy in the motion of time series, or the variance 
attributable to the components relative to the total variance of the series is 
a central feature of the analysis. While the treatment of these problems is 
presented in some detail and with mathematical precision, important aspects 
of the subject are inadequately covered. Problems of discovering and inter- 
preting relationships among time series and the associated problems of 
significance play but a minor and unsatisfactory part in the analysis. It must 
be pointed out that to follow the details of this work requires a considerable 
knowledge of mathematics, particularly function theory. The treatment of 
the material from this standpoint is elegant and those familiar with this 
field should find the book stimulating and thought provoking. 

The first chapter of the book presents a concise history of the problem and 
serves as a splendid introduction to the treatment given in detail in the 
chapters that follow. Two chapters are devoted to the theory of harmonic 
analysis and its application. A discussion is given of the analysis of trends 
which includes applications of the logistic curve to the study of population 
growth. Other subjects treated in detail are serial correlation, theory of 
random series, the degrees of freedom in economic time series, and the nature 
of income and wealth. 

A major interest in these techniques lies in their utility in forecasting 
economic variables, a problem considered in the closing chapters of the book. 
The discussion at this point is inadequate since it throws no light on the 
problems that must be solved in forecasting economic time series. Variations 
in the movement of such series are usually dependent on many factors some 
of which are measurable and others non-measurable. Criteria for the selec- 
tion of the factors, the form of relationship among the variables, and tests 
of the reliability of the forecasts are essential in predicting the future of 
economic variables. This phase of the problem is not touched upon in this 
volume. Despite these shortcomings, Professor Davis has made a distinct 
contribution in making available to the student a brilliant presentation of 
powerful techniques of analysis which promise much for the future develop- 


ment of economics as a science. 
Louis J. PARADISO 


U. S. Department of Commerce 
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A First Course in Statistics, by E. F. Lindquist. Boston: Houghton Mifflin 
Company. 1942. xiii, 242 pp. $2.50. 


Study Manual for a First Course in Statistics, by E. F. Lindquist. Boston: 
Houghton Miffiin Company. 1942. 117 pp. $1.00. 


This text, with its lack of exercises, impresses one as being too verbal and 
descriptive. This can be explained, however, in the light of the purposes as 
stated in the preface. The author states that he wishes to “stress as much 
as possible the uses and interpretations and minimize as much as possible 
the mathematical theory of statistics and the mechanics of computation.” 

While the aims are worthy, it is difficult to see how a sound appreciation 
of statistical procedures can be achieved or thoroughly critical attitudes can 
be developed without some training in the mathematical bases of the sub- 
ject. Furthermore, one may fairly question the inclusiveness of these aims, 
since it is highly desirable that students gain some adequate facility in the 
use of all basic processes. The correct choice of procedures and the precise 
manipulation of techniques are certainly essential aspects of interpretation. 

The treatment is unique for an elementary text and one which is timely. 
The emphasis on student activity in logical analysis, the stress placed on 
sampling theory and especially the techniques suitable for small samples, the 
use of statistical methods, and the systematic development of essential 
information, make for a valuable contribution which may help to reshape 
textbooks in this field and on this level. The text is also to be commended 
for its clear definition of the “integral measure,” its presentation of definite 
steps in computational procedures, its recognition of the fact that relatively 
few of the distributions actually encountered are “normal,” and its common- 
sense interpretation of reliability and the coefficient of correlation. 

There is a tendency toward formalization in treating the determination of 
the number and size of intervals, a lack of presentation of operations suit- 
able for varied types of data such as attributes and skewed distributions, and 
a failure to acquaint the student with variations in processes of computation. 
Some may also question the advisability of omitting such topics as the har- 
monic mean, the geometric mean, binomial expansion, index numbers, non- 
linear correlation, and many of the applications of the normal curve theory 
used in test construction. 

The manual, as indicated by the title and a statement in the Foreword, is 
designed for use with the text. One may doubt that it “cannot be used with 
other texts,” as most of it should be clearly comprehensible to the careful 
student of other texts in education and psychology. One normally anticipates 
that a manual which accompanies a statistical text will be characterized by 
many specific exercises calling for much practice in computation. In this 
case, however, the manual is almost as verbal as the text, since the exercises, 
for the most part, call for verbal explanation and interpretation. 

In places one detects a tendency for these questions to become highly 
involved in minor issues, or to deal with points which have little general 
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significance. The instructor, of course, has freedom of selection. The pages 
are perforated and punched, making it easy to hand the assignments in and 
later to bind them in a notebook. 

Many experienced instructors find it difficult to adapt themselves to the 
use of a manual constructed by another, chiefly because of variance of con- 
victions as to where emphases should be placed. The manual will prove of 
great assistance to those who wish to stress logical analysis and interpreta- 
tion, even though they find it advisable to supplement it with practical 
exercises for the purpose of developing facility in operations by the student. 

Pau V. West 


New York University 


Investment and Business Cycles, by James W. Angell. New York: McGraw- 
Hill. 1941. xviii, 363 pp. $3.00. 


Professor Angell’s interesting study falls into three logically distinct, but 
closely related parts: First he attempts to set up a general hypothesis which 
will explain the appearance and reappearance of what the author considers 
the core of observed cyclical fluctuations in modern economic activity, 
“self-generating business cycles”; then he proceeds to examine the relation- 
ships between changes in the money stock, the volume of new investment, 
national money income and the volume of employment, laying stress upon 
the circular velocity of money as an explanatory mechanism, but also giving 
an illuminating discussion of the “multiplier” concept; finally he discusses 
the problems connected with government spending and the financing of the 
defense program, as seen from the theoretical background given earlier. 

Professor Angell rejects any attempt to explain the similar cyclical be- 
havior found in many economic time series by reference to “exogenous fac- 
tors,” as was done by Schumpeter, Moore, et al. Instead he sees inherent 
in any system of relatively free enterprise a set of forces which make for 
persistent and roughly simultaneous up and down fluctuations in employ- 
ment, national income, investment, etc., fluctuations which would occur in 
a never ending “self-generating” sequence even if all outside factors were 
stabilized. As he sees this mechanism, cyclical changes in national money 
income are mainly produced by changes in the vclume of new investment; 
this latter depends in turn upon the general level of anticipations in the 
previous period; and this in turn hinges largely on the rate of change of in- 
come during a still earlier period. Considered from a mathematical point of 
view, these relationships do not necessarily imply a cyclical movement of 
income, but are also consistent with indefinite monotonic increase or de- 
crease. The upward movement must come to an end because of some or all 
of three sets of factors: disproportionately rising cost-price ratios; saturation 
of short run investment activity because of accumulated investment in the 
recent past; and—presumably most fundamental—the fact that the middle 
and upper income classes cannot rationally continue indefinitely to spend 
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all their increases in income on consumption and investment, so that they 
must begin to hoard after a certain point. These all make for a decline in the 
rate of increase of income, a fall in the level of anticipations, and the start 
of the downturn. The converse process occurs in the transition from trough 
to upturn. 

In the second part of this study, stress is laid upon the changed relation- 
ships between national income and money stock as between 1899-1929, 
1929-33, and 1933-39. The altered money-using habits of the American 
people which these changes reflect, Professor Angell suggests somewhat 
tacitly, may be a fundamental clue to the altered nature of our economic 
problems since 1929. 

The acute observations on the current scene, particularly the suggestion 
that systematic manipulation of government spending and taxing powers 
may be means of stabilizing economic activity, are extremely interesting, 
and offer valuable clues for possible government action to win the peace 
after the present conflict. 

There is little one can say in just criticism which has not already been 
anticipated and considered by the author. He admits he does not deal with 
the whole of cyclical economic reality. In fact, in his later chapters Pro- 
fessor Angell lays to deficit spending main responsibility for the 1934-36 
upturn in this country. He recognizes the difficulty of statistical verification 
of his hypothesis, and concedes that the numerical coefficients defining his 
model relationships would probably differ considerably from period to 
period, reflecting largely the impact of the exogenous forces. 

As the United States goes through its present trial, and we see the all too 
patent relationship between the present upturn in production and employ- 
ment, and a very specific “outside” factor, there is a temptation to reject off 
hand any such “self-generating” hypothesis as offered here. Moreover it 
seems unlikely that in the future there will be quite the same type of eco- 
nomic freedom as that which Professor Angell sees as having a primary causal 
significance in making the downturn inevitable. Nevertheless this study has 
value as an attempt toward logical explanation of the cyclical activity ex- 
perienced under capitalism and perhaps as a guide to future action to modify 
the “self-generating” possibilities of an economy in which there is both 
inequality of income and freedom of investment opportunity. 

Harry ScHWARTZ 


Brooklyn College and War Production Board 


Deficit Spending and the National Income, by Henry H. Villard. New York: 
Farrar and Rinehart. 1941. xviii, 429 pp. $3.50. 


The economic effects of public expenditure when resources are fully em- 
ployed, as they tend to be in a wartime economy, are naturally quite differ- 
ent from its effects in time of depression. Mr. Villard’s book represents the 
most comprehensive review of the latter problem, both theoretical and 
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practical, which has so far appeared. In periods of underemployment may 
not government spending prove an effective method of raising the level of 
economic activity, perhaps even of securing permanent prosperity? This 
question is scarcely topical at the moment, but its practical interest seems 
certain to be revived in the future, perhaps suddenly and acutely. After a 
review, both thorough and discriminating, of the analytical literature of the 
past decade relating to the multiplier, the author declares that the case for 
cyclical deficit spending has been established. This is defined as spending 
which leaves the national debt unchanged over the course of an entire busi- 
ness cycle. On the other hand he rejects the plea sometimes offered that 
continuous or progressively increasing public investment is necessary to 
offset the decline in population growth and other long term factors leading 
to the stagnation of investment, preferring if necessary to discourage the 
propensity to save by appropriate measures of taxation; “only to the extent 
that a reduction in... savings is unable to solve the problem of secular 
unemployment does there seem to be a case for permanent deficits.” How- 
ever, he would presumably admit improvements financed by loans which 
led to secular increases in the national debt if such improvements were 
desirable in themselves. 

To the statistician the main interest of the book will probably center in 
Villard’s derivation of monthly estimates of net income-increasing expendi- 
ture for all governmental units for the period since 1929. In the case of the 
federal component, the adjustments made to the crude cash deficit (to ex- 
clude revenue which is not income-decreasing and expenditure which is not 
income-increasing) are for the first time explained and justified in detail. The 
new monthly series for the combined cash deficit of state and local govern- 
ments, derived from changes in outstanding debt, will be of great value 
even though the coverage of cash assets and short term liabilities is admit- 
tedly incomplete. The substantial discrepancies between Villard’s series for 
this item (Table 15) and the official estimates on an annual basis presented 
to the Temporary National Economic Committee (original data, Hearings, 
p. 4011 (Part 9); revised data, Monograph 37, p. 111) are disturbing. How- 
ever, the facts that the Treasury have not published their methods, and that 
Villard is able notwithstanding to make what appear to be damaging criti- 
cisms of these methods, suggest that his estimates are preferable to the 
official ones. 

Harotp BARGER 


Columbia University 


Paying for Defense, by A. G. Hart, E. D. Allen, and others. Philadelphia: 
The Blakiston Company. 1941. viii, 272 pp. $2.50. 


Paying for Defense in substance does two things: it sets forth the im- 
mediate economic and financial problems which the war has presented to 
the people of the United States, and it indicates the direction in which the 
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authors believe the answers are to be found. The book is simply and directly 
written, and while much careful and scholarly research went into it, it is 
clearly designed for the layman as well as for the economist. As must be the 
case with a volume that deals with a rapidly changing currrent situation, 
certain of the facts and figures are somewhat out of date, but this cirecum- 
stance does not diminish the usefulness of the study. 

The presentation of the basic problem of war economics, how to furnish 
the war effort with the necessary men and material and at the same time 
provide the requisite funds in such a way as to minimize price increases and 
inflationary dangers, is excellently done. Few persons will cavil at the 
authors’ definition of war objectives—getting maximum output; preventing 
inflationary general price increases; sharing defense burdens fairly; giving 
all citizens a sense of sharing in defense; releasing resources needed for de- 
fense; promoting a healthy financial structure. In analyzing the possible 
ways of achieving these goals the authors bring to bear their familiarity 
with recent economic thinking and research, and explore the different types 
of taxation and borrowing, indicating the chief advantages and disadvan- 
tages of each. 

The authors believe that the country is most likely to obtain the desired 
objectives through a broadening of the tax base, a deduction at the source 
income tax, and a flexible income tax rate which would be adjusted upward 
when increases in the cost of living evidenced the presence of excess purchas- 
ing power and the desirability that it be absorbed by the government. The 
economic reasoning that underlies this predilection is hardly to be ques- 
tioned. Yet it may be doubted whether the analysis explores sufficiently the 
administrative feasibility of the proposal, and whether the study gives 
enough attention to the difficulties that business concerns would face were 
they suddenly to find themselves the principal tax collectors of the Federal 
Government. These topics are considered in the book; but the very real 
problems which lie in these areas are not to be minimized. 

The authors are to be congratulated both on the skill and the speed with 
which they have analyzed in terms that the general public can comprehend 


the issues of war economics. 
CHARLES CorTEZ ABBOTT 


Harvard University Graduate School of Business Administration 


The Flow of Business Funds and Consumer Purchasing Power, by Ruth P. 
Mack. New York: Columbia University Press. 1941. xvii, 400 pp. $3.75. 


In this long and rather difficult to read book, Mrs. Mack analyzes the 
relationship between the flow of funds through business enterprises and the 
volume of consumer purchasing power. Chapter I is introductory. Chapters 
II-VI, occupying half the book, discuss selected sources and uses of funds 
and changes in some balance sheet items of fifty-four corporations, priaci- 
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pally for the years 1934-38. This analysis is unusually full and detailed. One 
of the principal findings, that gross retained income during 1934-38 was more 
than adequate to meet total capital expenditures related to that income— 
despite the accumulated replacement need, due to wear and tear during the 
depression, despite rapid technological changes, and despite the not un- 
usually high profits for the period—supports a similar proposition presented 
in the Temporary National Economic Committee’s Hearings on Savings and 
Investment. Chapters VII and VIII, occupying one-quarter of the book, are 
based largely on interviews with eighty-six corporation executives. These 
chapters get at the “judgment factors and business techniques determining 
the depreciation accrual and capitalized expenditures for fixed assets”; they 
are extremely rich and interesting. Chapters IX and X, occupying the last 
quarter of the book, attempt to sketch the theoretical significance of the 
flow-of-income analysis, and the possibilities of using this analysis for main- 
taining full employment. Though these chapters are somewhat jerky and un- 
polished, they are probably the most important in the book. 

Mrs. Mack is principally concerned with investigating the hypothesis that 
non-financial business enterprises may feed purchasing power to consumers 
at different rates than they are prepared to offer consumers’ goods for sale. 
She is also concerned with the larger question whether consumer purchasing 
power in general is adequate. To analyze these propositions, Mrs. Mack 
develops the conception of “markets,” or “economic areas within which in- 
come actually constitutes likely spending power in the same field.” “The 
relations between income and expenditure within markets have certain 
characteristic patterns that may not be shared by other markets.” Five 
markets are identified: consumer, industrial, financial, government, and 
foreign. The suggestion is made “that the transmarket spending of income is 
far more erratic and jumpy than is, for example, the flow of consumer income 
into consumers’ purchases, or money invested in one sort of security into 
other securities or banks.” 

These thought-provoking hypotheses are not, however, conclusively 
tested. The calculations in Chapter IX, dealing with the estimated source 
or application of funds between the consumer and industrial market during 
1932-38 are admittedly rough and based upon arbitrary allocations. On the 
other hand, the proposals for analyzing the flow of income with quarterly 
and monthly data are very suggestive. 

Oscar L. ALTMAN 


National Resources Planning Board 


Banking Operations in Ohio, 1920-1940, by J. M. Whitsett. Columbus: 
The Ohio State University, Bureau of Business Research. 1941. xxv, 217 


pp. 
This monograph presents a statistical study of the operations of state and 
national banks in the State of Ohio for the years 1920 to 1940. The statistical 
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tables presented are compiled from published reports of the Comptroller of 
the Currency and from both published and unpublished material in the 
Division of Banks of the State of Ohio. 

There are six chapters dealing with (1) banks and the economic structure 
of Ohio; (2) changes in the banking structure of Ohio since 1920; (3) changes 
in banking practice in Ohio; (4) trends in earnings, expenses, and profits of 
Ohio state and national banks, 1921-1939; and (5) profitableness of Ohio 
state banks by size groups, 1930-1939. 

This list of subjects represents an ambitious undertaking; it is not sur- 
prising perhaps that it raises hopes in the reader which are disappointed in 
the reading. The intriguing subject of ‘banks and the economic structure of 
Ohio,” for example, is disappointingly discussed with some six pages of text 
in which for the most part materials are summarized from published sources 
that are a decade or more of age, and some of which represent broad generali- 
zations on the national economy. The applicability of these generalizations 
to the economy of Ohio is not tested or commented on. The significance of 
“Ohio as an agricultural state” to banking is given only 16 lines, and 
“changes in marketing” (presumably in Ohio) are presented by a summary 
of the conclusions in Recent Economic Changes, published in 1929! 

The chapters on banking operations and practices present material on 
number and resources of banks, the character of their assets and liabilities, 
and their earnings, expenses, and operating ratios, broken down by state 
and national bank classes; this material should provide both interesting and 
useful standards by which individual banks in the state could appraise their 
own position and performance. The treatment of these materials in the text 
is somewhat cumbersome, but there are a number of skillfully drawn charts 
which facilitate interpretation. 

Ernest M. FIsHER 

American Bankers Association 


Exchange Control and the Argentine Market, by Virgil Salera. New York: 
Columbia University Press. 1941. 283 pp. $3.50. 


Shortly after the conclusion of the Ottawa Agreements a treaty was 
negotiated between Great Britain and Argentina providing for preferential 
treatment of Argentine meat exports and for a virtual balancing of the two 
countries’ accounts which previously had been heavily in favor of Argentina. 
The main purpose of Salera’s book is to show that the Argentine exchange 
control system was employed as a means of favoring British exports and 
discriminating against the United States by allotting favorable exchange 
rates to the former while forcing the latter to come in at the less favorable 
“free market” rate. 

These discriminatory measures frequently appeared in disguise, and 
Salera does an excellent job in uncovering their effect upon the competitive 
status of Britsh and American products in the Argentine market. In general 
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the book provides much information that is becoming increasingly important 
to us at the present time. 

The author’s aversion to bilateralism, however, leads him to do less than 
justice to Argentina’s economic policies. It is true, of course, that Argentina 
failed to obtain many of her imports from the cheapest supplier. On the 
other hand, the close association with the prosperous Sterling Block brought 
advantages which a trade agreement with the United States could never 
have matched. It is only since the outbreak of war that her dependence on 
Europe has become a serious drawback for Argentina. 

In the handling of monetary and exchange problems the Argentine Gov- 
ernment has shown ability of a very high order under difficult conditions. 
All through the depression Argentina was dependent on the export of agri- 
cultural products, while protectionism on the part of her customers in- 
creased. Nevertheless Argentina, by common consent, weathered the de- 
depression better than most other countries. She maintained a fairly stable 
price level throughout this period and managed to service her foreign debt 
without interruption. To some extent this performance, almost unmatched 
in Latin America, was due to fortunate circumstances, such as good harvests 
in years when there were crop failures in the northern hemisphere. But 
beyond this it has been due to three deliberate acts of policy: (1) abandoning 
the gold standard as early as 1929, thereby shielding the country against 
deflation, (2) the adoption of the dual market system of exchange control, 
which provided a safety valve in the form of the “free market” rate and 
prevented the creation of a rigid and artificial price system, and (3) the 
close association with the Sterling Block. A somewhat more appreciative 
evaluation of these policies would have added to the merits of Salera’s book. 

Henry C. WALLICH 


Federal Reserve Bank of New York 


Interamerican Statistical Yearbook, 1940. Edited by Raul C. Migone with 
the assistance of Marcelo Aberastury, Emilio Fuente, and Jorge E. Itur- 
raspe, under the auspices of the Comision Argentina De Altos Estudios 
Internacionales. New York: The Macmillan Company; Buenos Aires: 
El Ateneo; Rio de Janeiro: Freitas Bastos & Cia. 1940. 612 pp. 


The Interamerican Statistical Yearbook, 1940, is a particularly useful 
pioneer venture undertaken under the auspices of the Argentine Commission 
of International Studies by a staff directed by the distinguished Argentine 
official and scholar, Dr. Raul C. Migone. Its announced purpose is to “serve 
the American community, holding before it a faithful mirror in which 
America may see herself as she is” (p. 24). 

The data used were taken in most instances from publications of inter- 
national organizations rather than directly from reports of any one country. 
The editors state that “all the data possible to obtain from official inter- 
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national sources” have been included in the volume. The content therefore 
reflects the relative availability of data on different subjects in such sources. 
About half of the volume consists of tables on international trade. The 
remaining half includes large sections on public finance, demography, and 
agricultural, mineral, and manufacturing production; and smaller sections 
on wages and hours of labor, prices, banking and money, transportation 
and communication, education, health, defense, and participation in inter- 
national organizations and agreements. Subjects that are omitted because 
of lack of international data on a comparable basis include, among others, 
domestic trade, national income, occupations and employment, construction 
and housing, elections and extent of domestic political units, publications 
and libraries, and the extent of cultural organizations for concerts, theaters, 
and the fine arts. In arrangement, in attractiveness and presentation, the 
compilers are to be commended. The large clear print is also notable. 

The textual matter, including heading, stub, and notes, is in each of the 
four official languages that exist in the American countries; namely, Spanish 
English, Portuguese, and French. While the problems of arrangement and 
space occasioned by use of four languages for purposes of international 
courtesy and increased popular usefulness have on the whole been admirably 
solved, the elimination of English and French in future editions should be 
considered. Advantages in added space for statistical content might out- 
weigh the bother of dictionary reference by the few users who cannot read 
either Spanish or Portuguese, and the present text in British idiom is not 
always useful to North American readers. 

There is a question whether the distinguished and competent editors 
might not discard advantageously their self-imposed handicap of depending 
almost wholly upon other international statistical compilations for their 
data, and undertake to obtain as much data as possible directly from the 
American governments or from their official reports. Although some addi- 
tional work would be involved, the advantages of having a new independent 
source of reliable statistical information about the American nations would 
be enormous. Data would be more recent than is possible when obtained in- 
directly through other published compilations. Significant non-confidential 
data that exist only in relatively inaccessible or unpublished form in the 
capitals of the respective American countries would become available to 
the world at large for the first time. Perhaps also some nations might even 
begin the collection of information never before obtained by them, if re- 
quested to fill gaps in inter-American comparisons regularly included in*an 
authoritative statistical yearbook dedicated to serve the American com- 
munity. In any case, the Yearbook has become, with its first issue, a princi- 
pal source of inter-American information collated with international per- 
spective and high standards of scholarship. 

E. R. Gray 


Bureau of the Census 
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Commodity Yearbook, 1941. New York: Commodity Research Bureau, Inc. 
1941. 636 pp. $7.50. 


This reference work is the third in a series designed to provide both de- 
scriptive background and statistical records covering a wide range of com- 
modities. It is prepared by a group of editors in New York who have had 
considerable experience in commodity markets, and who have utilized a 
variety of sources, official and unofficial, for this compilation. Eaeh year- 
book has given special emphasis to some aspect of commodity markets. Thus 
the 1940 edition covered up-to-date processing methods by which raw ma- 
terials are converted into finished form, and the 1941 edition, in addition to 
a brief description of each commodity, discusses the source of supply of 
raw materials, the channels through which they are marketed, and some of 
the factors affecting supply and demand. The individual commodity sec- 
tions, of which there are 75 in the 1941 Yearbook, vary in length from 3 or 4 
pages for the less important commodities, to 10 to 25 pages for major markets 
such as wheat and flour, cotton, and steel. The statistical tables, which 
cover production, consumption, exports and imports, stocks, prices, and 
other relevant data, are exceedingly useful for reference purposes. The 
descriptive material, written in a highly readable style, is by no means ex- 
haustive but indicates as a rule a considerable degree of familiarity on the 
part of the editor with the commodity in question and covers the high lights 
of the market. While such a yearbook cannot, of course, provide any great 
detail, it does give within the covers of a volume answers to a great many 
questions which sometimes require hours of searching through scattered 
sources. 

The Commodity Yearbook for 1941 also has an excellent introductory 
section composed of three chapters. The first, on war-time control of com- 
modities, summarizes the economic problems arising out of commodity 
regulation in World War I and the manner in which they were handled by 
various governmental agencies; the second is a clear description of the opera- 
tion of commodity exchanges; and the third, on war and commodity prices, 
indicates the major factors affecting prices in war time. 

Beyond these introductory chapters, there is little orientation toward the 
problems arising out of the war, since the Yearbook went to press early in 
1941, and in this respect it may prove disappointing. Practically speaking, 
the statistical record ends with the year 1940, although some of the statisti- 
cal tables cover the first quarter of 1941. In reading the individual com- 
modity chapters in the light of the development of the defense program, 
there is a notable absence of a critical analysis of the shortages of supply 
which war abroad and defense at home have produced. This is especially 
true in the case of some of the metals—iron and steel, lead, aluminum. It is 
recognized, of course, that the industrial developments of 1940-41, after the 
beginning of the defense program, have greatly changed the supply picture. 
For a few commodities, moreover, certain phases of marketing and supply 
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appear not to have been adequately dealt with, although these are the 
exception rather than the rule. Milk is an illustration. Here neither the 
complex price-fixing procedure of state and federal regulatory agencies nor 
the buying policy of milk distributors is given adequate attention. From 
the point of view of the technical statistician, the tables would be improved 
by somewhat longer explanatory footnotes. 

It is to be hoped that the 1942 edition will contain as much material as 
possible on the impact of the war on specific commodity markets, both as 
regards supply and demand and current techniques of government regula- 
tion. There is no doubt that the public will be greatly concerned with 
commodities for the duration of the war and that a current, up-to-date com- 
pendium of this kind should prove invaluable. 

ARYNESS JOY 


Bureau of Labor Statistics 


Trends in Retail Trade and Consumer Buying Habits in the Metropolitan 
Boston Retail Area, by Richard P. Doherty. Boston: Bureau of Business 
Research, Boston University College of Business Administration. 1941. 
40 pp. $1.00. 


This study is an attempt to explain, on the basis of a consumer survey, the 
decentralization of retail sales away from the downtown stores in Greater 
Boston, as shown by the Censuses of Business for 1929, 1935, and 1939. The 
report offers the following substantive conclusions: 

1. The Census Metropolitan District area is smaller than the area in- 

fluenced by retail outlets in Boston proper. 

2. Although the relative population of Boston and the communities in- 
cluded in a 30 mile radius have remained constant over the ten year 
period, there has been a decline in the percentage of sales accounted 
for by Boston stores. 

The distribution of food sales has remained almost constant, but other 
classifications have shown an appreciable shift to stores outside of Boston 
proper, particularly in the case of automobile sales. The shifts in trade were 
presumably away from stores in the City of Boston to stores in the adjacent 
neighborhood area. 

The consumer survey develops five reasons for consumer purchases in 
Boston stores by residents of outlying communities: better style, larger 
assortments, better quality, reliability of store, and price, including bargain 
sales. 

Six reasons are given for purchases in stores within the “home” area: con- 
venience in purchasing near home, reliability of stores, quality and style, 
delivery and charge service, acquaintance with proprietor or local loyalty, 
and location of stores with regard to theatre and other shopping facilities. 
A more detailed analysis of these shopping motives is given for each of the 
various commodity lines studied. 








306 AMERICAN STATISTICAL ASSOCIATION - 


The report offers little new in technique, but, in its analysis of census 
data, offers a convenient basis for further analysis and application of infor- 
mation. Were similar reports made for other communities, the data would be 
even more useful. The biggest question in the mind of the reviewer has to 
do with the consumer survey which reinforces the analysis of census data. 
A questionnaire is mentioned, but no copy is submitted. No statement is 
made as to whether the survey is based on interviews or on a mail canvass. 
No statement is given as to the source of the list or the basis of distribution 
of interviews. If interviewers were used, the reader would be more confident 
if some word was given as to their selection, training, and supervision. More- 
over, the consumer reactions reported are so general that some doubt may be 
justified as to whether the interviewing technique penetrated below ration- 
alization. 

LAWRENCE C, LOCKLEY 


Curtis Publishing Company 


Statistical Cost Functions of a Hosiery Mill, by Joel Dean. Chicago: The 
School of Business, University of Chicago Press. 1941. ix, 166 pp. $1.00. 
This publication presents the results of one of a series of investigations by 

Professor Dean of the cost behavior of individual enterprises “with a view to 

obtaining valid generalizations concerning cost behavior under varying con- 

ditions.” The study follows the pattern of multiple and partial correlation 
analyses heretofore used principally in demand function studies. Regressions 
were computed between output, and overhead cost, non-productive labor 
cost, productive labor cost, and combined total cost. Linearity was assumed in 
each case, and time was used as a third variable in each set of correlations. 

The partial regressions of time on the dependent or cost variable enabled the 

author to remove the influence of improvements in managerial performance 

and to reduce the scatter about the output-on-cost regression lines. Estimates 
of marginal and average costs were then made directly from the regression 

functions. . 

A careful series of tests was made for the reliability of the statistical 
analysis including the assumptions of linearity. Adjustments to remove the 
effect of changing input prices on costs had also to be provided where neces- 
sary. 

Although not introducing any techniques basically new, this work pre- 
sents the application of a simple statistical method to what is usually con- 
ceived a highly complex problem with apparently significant results. This 
end was of course facilitated by the simplicity of the production process 
considered. The elimination from costs of such items as depreciation, general 
administrative, and freight costs also tended to reduce the relative impor- 
tance of overhead which otherwise might well have disturbed the linearity 


of the combined total cost-output regression. 
O. J. McDIarmip 


College of William and Mary 
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The Petroleum Industry, by Ronald B. Shuman. Norman, Oklahoma: 
University of Oklahoma Press. 1941. xiv, 297 pp. $3.00. 


Control of the Petroleum Industry by Major Oil Companies, by Roy C. Cook. 
Washington: Temporary National Economic Committee. Monograph No. 
39. xi, 191 pp. $1.00. 


Review and Criticism on Behalf of Standard Oil Company (New Jersey) and 
Sun Oil Company of Monograph No. 39, with Rejoinder by Monograph 
Author. Temporary National Economic Committee. Monograph No. 
39-A. vi, 96 pp. 

Professor Shuman’s book is a brief survey of the oil industry: production, 
refining, transportation, marketing, taxation, labor problems, international 
trade in oil products, and conservation, with chapters on the demand for oil 
products and on the natural-gas industry. It is not exhaustive in its discus- 
sion of any of these subjects, but the author has given a clear and well- 
balanced picture of the industry, with judicious conclusions on controversial 
points. Professor Shuman offers no particular indictment of the Standard 
Oil Company, or of the twenty major oil companies, or of the oil industry 
in general, but sketches briefly the reasons why the integrated major com- 
panies dominate the industry, and why the independents live precariously 
and often not long. Perhaps it is ungrateful to wish that the book were larger 
and more complete, but the reviewer closed it wishing for more. The author 
knows the oil industry well, and he writes with clarity, pungency, and oc- 
casionally with fine humor. One of the amusing anecdotes in the book, for 
instance, relates to the corruption of state officials in oil-proration, when one 
firm was “credited with attempting to bribe a functionary to remain honest, 
the intent being to circumvent the plans of a rival to move out several times 
his allowable.” The Petroleum Industry is a much more than ordinarily read- 
able book. 

Mr. Cook, a member of the staff of the Department of Justice, worked 
with the T.N.E.C. in the investigation of the oil industry, and his monograph 
is a study of the evidence presented before the T.N.E.C., and of other 
sources, a study devoted to the problem of monopoly control in the industry. 
He summarized the evidence of monopoly control by the integrated major 
companies, their 85 per cent of the trunk pipe line mileage, 96 per cent of 
the gasoline pipe line mileage, 87 per cent of the oil tankers, 75.6 per cent of 
the refining capacity, their almost complete control of patent companies, and 
their 52 per cent of crude production. He indicates furthermore, that their 
share of the business has been growing, and is likely to grow in the future. 
The marketing field, the most competitive part of the oil business, is left 
largely to individual operators, who are nevertheless under fairly complete 
control of the major companies, and apparently conduct their business at 
a loss. The author does not establish any general plan of cooperative or collu- 
sive action by the major companies, except in the activities of the American 
Petroleum Institute; but he does point out that they, or some of them, often 
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act cooperatively, as for instance, in the ownership of patents and pipe 
lines, and in some other matters. 

The reply of U. 8S. Farish, of the Standard Oil Company, and J. Howard 
Pew, of the Sun Oil Company, is a general denial of the charge of monopoly, 
and of some of the conclusions of Mr. Cook; and Mr. Cook’s rejoinder, 
insisting on the soundness of his analysis, is a part of the same monograph. 

There is not space here to weigh all the evidence presented in these two 
monographs, but the reviewer would like to suggest that in their main con- 
tention, as to the question of monopoly, Mr. Cook and the oil men are both 
correct. Certainly the twenty major integrated companies do in a large 
measure “control” the oil industry; yet, just as certainly, there is competi- 
tion, sometimes aggressive competition, among the majors and with the 
independents. In such competition the unintegrated independents are 
gravely handicapped, usually, and he must be an optimist who can see a 
bright future for them; but that situation is characteristic of many kinds 
of business today. The concept of imperfect competition would fit nicely 
here, or perhaps “monopolistic” competition would be better, with reason- 
able emphasis on monopolistic elements in the business. At any rate, these 
two monographs will be welcomed by students of the oil industry. 

JouN IsE 


University of Kansas 


Eleven Twenty-Siz, A Decade of Social Science Research, by Louis Wirth, 
Editor. Chicago: University of Chicago Press. 1940. xv, 498 pp. $3.50. 


This book depicts the story of ten years of activity at the Social Science 
Research Building of the University of Chicago—1126 East 59th Street. It 
contains the addresses and papers given at the tenth anniversary celebration 
(in December, 1939) of the dedication of the building, the discussions at five 
round tables held at the time, and a bibliography of the publications of 
present and past members of the University of Chicago social science faculty. 

The volume is essentially a report of progress during a most important 
period in the development of social science in the United States. It shows the 
methods employed by outstanding social scientists, the wide range and inter- 
relation of current problems, and the efforts being put forth to provide 
solutions. 

More than half of the book is devoted to the proceedings of the tenth an- 
niversary meeting, among which the round table discussions are the most 
significant, especially as they throw light upon insistent questions still in 
dispute among social scientists. Two such insistent questions, out of a num- 
ber that are discussed in the proceedings, may be cited as illustrations. 

(1) Should quantification in the social sciences proceed wholly from the 
point of view of ordinary “extensive” measurement or are there other quan- 
tification criteria of even greater importance here? 

It was as a result of developments in modern physics, biology, and psy- 
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chology that the distinction between primary, secondary, and tertiary quali- 
ties or levels of analysis was advanced. From the point of view of quantita- 
tive analysis, the first level may be called the primary physical-science level 
of extension, form, and time. The second level may be designated the bio- 
psychological level dealing with degrees of difference within “subjectively” 
perceived qualities such as sound, color, taste, or smell. The third level may 
be called the psycho-social level covering relational differences among such 
qualities as desires, interests, judgments, and values. 

(2) Should normative considerations have a place in social science or are 
they contra-indicated here as in natural science? 

It is repeatedly insisted that we should not be concerned about doing 
“good” in the social studies. And yet it should also be borne in mind here 
that there is what seems to be an essential difference in this respect between 
the social studies and the natural sciences. In the social realm we are con- 
tinuously doing “good” or “bad,” not in our investigations necessarily, but 
as active agents who are constantly changing the social structure as such. 
We make governmental rules and regulations, for example. We pass laws, 
municipal, state and federal. And thus, in these and in many other respects, 
we are shaping and reshaping the social order itself. We make the social 
facts as we go, and we do not do that in the natural realm, that is, if we are 
thinking of the basic structure of the material universe, of the astronomical 
relations between the planets, the physical relations in mechanics, the inter- 
action of the elements in chemistry, the constitution of the geological world, 
the genesis and evolution of the biological organisms with which we have 
contact. These are not changed by our being in the world in the sense that 
we have any power to change the laws governing them. But in being in the 
world (in historic time), man originally created the social order itself, and 
he has since repeatedly changed its essential constitution. 

The question here is whether what we investigate in the social realm is 
not something much more flexible and unpredictable than what we investi- 
gate in the natural realm. In physical and biological science we can confi- 
dently predict what is to happen as soon as we have ascertained the laws 
governing a certain set of natural phenomena, such as Kepler discovered 
with respect to planetary motion. But in the social studies we have no such 
extensive power of prediction. At the same time, if man does have power to 
change the social order itself, then what is “good” or “bad” for society be- 
comes a legitimate branch of social study. Ethics and normative standards 
may thus be significant for social science, whereas they may have no place 
in physical and biological science. 

Eleven Twenty-Siz contains discussions by able social scientists on many 
insistent current questions such as these. It will be welcomed by students 
and scholars interested in border-line problems and in keeping abreast of 
informed opinion regarding them. 

JosEPH MAYER 
Bureau of Labor Statistics 
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